bulltrackers-module 1.0.259 → 1.0.260
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/functions/computation-system/helpers/computation_dispatcher.js +82 -44
- package/functions/computation-system/helpers/computation_worker.js +35 -39
- package/functions/computation-system/onboarding.md +712 -503
- package/functions/computation-system/persistence/ResultCommitter.js +127 -74
- package/functions/computation-system/tools/BuildReporter.js +28 -79
- package/functions/computation-system/utils/schema_capture.js +31 -2
- package/index.js +2 -4
- package/package.json +1 -1
|
@@ -1,654 +1,845 @@
|
|
|
1
|
-
# BullTrackers Computation System
|
|
1
|
+
# BullTrackers Computation System & Root Data Indexer
|
|
2
2
|
|
|
3
3
|
## Table of Contents
|
|
4
|
-
1. [System
|
|
5
|
-
2. [
|
|
6
|
-
3. [
|
|
7
|
-
4. [
|
|
8
|
-
5. [
|
|
9
|
-
6. [
|
|
10
|
-
7. [
|
|
11
|
-
8. [
|
|
4
|
+
1. [System Philosophy](#system-philosophy)
|
|
5
|
+
2. [Root Data Indexer](#root-data-indexer)
|
|
6
|
+
3. [Computation Architecture](#computation-architecture)
|
|
7
|
+
4. [Context & Dependency Injection](#context--dependency-injection)
|
|
8
|
+
5. [Execution Pipeline](#execution-pipeline)
|
|
9
|
+
6. [Smart Hashing & Versioning](#smart-hashing--versioning)
|
|
10
|
+
7. [Data Loading & Streaming](#data-loading--streaming)
|
|
11
|
+
8. [Auto-Sharding System](#auto-sharding-system)
|
|
12
|
+
9. [Quality Assurance](#quality-assurance)
|
|
13
|
+
10. [Operational Modes](#operational-modes)
|
|
12
14
|
|
|
13
15
|
---
|
|
14
16
|
|
|
15
|
-
## System
|
|
17
|
+
## System Philosophy
|
|
16
18
|
|
|
17
|
-
The BullTrackers Computation System is a **dependency-aware,
|
|
19
|
+
The BullTrackers Computation System is a **dependency-aware, distributed calculation engine** that processes massive financial datasets with strict guarantees:
|
|
18
20
|
|
|
19
|
-
- **
|
|
20
|
-
- **
|
|
21
|
-
- **
|
|
22
|
-
- **
|
|
23
|
-
- **
|
|
21
|
+
- **Incremental Recomputation**: Only re-runs when code or dependencies change
|
|
22
|
+
- **Historical Continuity**: Ensures chronological execution for time-series calculations
|
|
23
|
+
- **Transparent Sharding**: Handles Firestore's 1MB document limit automatically
|
|
24
|
+
- **Data Availability Gating**: Never runs when source data is missing
|
|
25
|
+
- **Cascading Invalidation**: Upstream changes automatically invalidate downstream results
|
|
26
|
+
|
|
27
|
+
### Key Design Principles
|
|
28
|
+
|
|
29
|
+
**1. Source of Truth Paradigm**
|
|
30
|
+
- The Root Data Indexer creates a daily availability manifest
|
|
31
|
+
- Computations are gated by data availability checks
|
|
32
|
+
- Missing data triggers "IMPOSSIBLE" states, not retries
|
|
33
|
+
|
|
34
|
+
**2. Merkle Tree Dependency Hashing**
|
|
35
|
+
- Every calculation has a hash that includes:
|
|
36
|
+
- Its own source code
|
|
37
|
+
- Hashes of math layers it uses
|
|
38
|
+
- Hashes of calculations it depends on
|
|
39
|
+
- Changes cascade: updating calculation A invalidates all dependents
|
|
40
|
+
|
|
41
|
+
**3. Stateless Execution**
|
|
42
|
+
- Each worker receives complete context for its task
|
|
43
|
+
- No inter-worker communication
|
|
44
|
+
- Infinitely horizontally scalable
|
|
24
45
|
|
|
25
46
|
---
|
|
26
47
|
|
|
27
|
-
##
|
|
48
|
+
## Root Data Indexer
|
|
49
|
+
|
|
50
|
+
### Purpose
|
|
51
|
+
|
|
52
|
+
The Root Data Indexer runs daily to scan all data sources and create a **centralized availability manifest**. This prevents the computation system from attempting to run calculations when source data doesn't exist.
|
|
53
|
+
|
|
54
|
+
### Architecture
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
┌──────────────────────────────────────────────────────┐
|
|
58
|
+
│ Root Data Indexer (Daily Scan) │
|
|
59
|
+
├──────────────────────────────────────────────────────┤
|
|
60
|
+
│ For each date (2023-01-01 → Tomorrow): │
|
|
61
|
+
│ 1. Check Normal User Portfolios (Canary: 19M) │
|
|
62
|
+
│ 2. Check Speculator Portfolios (Canary: 19M) │
|
|
63
|
+
│ 3. Check Normal Trade History (Canary: 19M) │
|
|
64
|
+
│ 4. Check Speculator Trade History (Canary: 19M) │
|
|
65
|
+
│ 5. Check Daily Insights (/{date}) │
|
|
66
|
+
│ 6. Check Social Posts (/{date}/posts) │
|
|
67
|
+
│ 7. Pre-Load Price Shard (shard_0) → Date Map │
|
|
68
|
+
│ │
|
|
69
|
+
│ Output: /system_root_data_index/{date} │
|
|
70
|
+
│ { │
|
|
71
|
+
│ hasPortfolio: true, │
|
|
72
|
+
│ hasHistory: false, │
|
|
73
|
+
│ hasInsights: true, │
|
|
74
|
+
│ hasSocial: true, │
|
|
75
|
+
│ hasPrices: true, │
|
|
76
|
+
│ details: { │
|
|
77
|
+
│ normalPortfolio: true, │
|
|
78
|
+
│ speculatorPortfolio: false, │
|
|
79
|
+
│ normalHistory: false, │
|
|
80
|
+
│ speculatorHistory: false │
|
|
81
|
+
│ } │
|
|
82
|
+
│ } │
|
|
83
|
+
└──────────────────────────────────────────────────────┘
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Canary Block System
|
|
87
|
+
|
|
88
|
+
Instead of scanning every user block (which would be prohibitively expensive), the indexer uses **representative blocks**:
|
|
89
|
+
|
|
90
|
+
- **Block 19M**: Statistically verified to always contain data when the system is healthy
|
|
91
|
+
- **Part 0**: First part of sharded collections
|
|
92
|
+
|
|
93
|
+
**Logic**: If Block 19M has data for a date, all blocks have data for that date. This is a deliberate architectural assumption that reduces indexing cost by 99%.
|
|
94
|
+
|
|
95
|
+
### Granular UserType Tracking
|
|
96
|
+
|
|
97
|
+
The indexer distinguishes between:
|
|
98
|
+
- **Normal Users**: Traditional portfolio tracking (AggregatedPositions)
|
|
99
|
+
- **Speculators**: Advanced traders (PublicPositions with leverage/SL/TP)
|
|
28
100
|
|
|
29
|
-
|
|
101
|
+
This enables calculations to declare: `userType: 'speculator'` and be gated only when speculator data exists.
|
|
30
102
|
|
|
31
|
-
|
|
103
|
+
### Price Data Optimization
|
|
32
104
|
|
|
33
|
-
|
|
105
|
+
Price data is handled differently:
|
|
106
|
+
|
|
107
|
+
1. **Pre-Load Once**: The entire `shard_0` document is loaded into memory
|
|
108
|
+
2. **Extract Date Keys**: All dates with price data are extracted into a `Set`
|
|
109
|
+
3. **Fast Lookup**: Each date check becomes O(1) instead of a Firestore read
|
|
110
|
+
|
|
111
|
+
This reduces price availability checks from **~1000 reads/day** to **1 read total**.
|
|
112
|
+
|
|
113
|
+
### Index Schema
|
|
34
114
|
|
|
35
|
-
#### **Standard (Per-User) Context**
|
|
36
115
|
```javascript
|
|
37
116
|
{
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
yesterday: { /* Yesterday's insights (if needed) */ }
|
|
56
|
-
},
|
|
57
|
-
social: {
|
|
58
|
-
today: { /* Social post insights */ },
|
|
59
|
-
yesterday: { /* Yesterday's social data (if needed) */ }
|
|
60
|
-
},
|
|
61
|
-
mappings: {
|
|
62
|
-
tickerToInstrument: { "AAPL": 123, ... },
|
|
63
|
-
instrumentToTicker: { 123: "AAPL", ... }
|
|
64
|
-
},
|
|
65
|
-
math: {
|
|
66
|
-
// All mathematical layers (extractors, primitives, signals, etc.)
|
|
67
|
-
extract: DataExtractor,
|
|
68
|
-
compute: MathPrimitives,
|
|
69
|
-
signals: SignalPrimitives,
|
|
70
|
-
// ... and more
|
|
71
|
-
},
|
|
72
|
-
computed: {
|
|
73
|
-
// Results from dependency calculations (current day)
|
|
74
|
-
"risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
|
|
75
|
-
"sentiment-score": { "AAPL": { score: 0.8 }, ... }
|
|
76
|
-
},
|
|
77
|
-
previousComputed: {
|
|
78
|
-
// Results from dependency calculations (previous day, if isHistorical: true)
|
|
79
|
-
"risk-metrics": { "AAPL": { volatility: 0.23 }, ... }
|
|
80
|
-
},
|
|
81
|
-
meta: { /* Calculation metadata */ },
|
|
82
|
-
config: { /* System configuration */ },
|
|
83
|
-
deps: { /* System dependencies (db, logger, etc.) */ }
|
|
117
|
+
date: "2024-12-07",
|
|
118
|
+
lastUpdated: Timestamp,
|
|
119
|
+
|
|
120
|
+
// Aggregate Flags (true if ANY subtype exists)
|
|
121
|
+
hasPortfolio: true, // normalPortfolio OR speculatorPortfolio
|
|
122
|
+
hasHistory: false, // normalHistory OR speculatorHistory
|
|
123
|
+
hasInsights: true, // Insights document exists
|
|
124
|
+
hasSocial: true, // At least 1 social post exists
|
|
125
|
+
hasPrices: true, // Price data exists for this date
|
|
126
|
+
|
|
127
|
+
// Granular Breakdown
|
|
128
|
+
details: {
|
|
129
|
+
normalPortfolio: true,
|
|
130
|
+
speculatorPortfolio: false,
|
|
131
|
+
normalHistory: false,
|
|
132
|
+
speculatorHistory: false
|
|
133
|
+
}
|
|
84
134
|
}
|
|
85
135
|
```
|
|
86
136
|
|
|
87
|
-
|
|
137
|
+
### Availability Check Logic (Computation System)
|
|
138
|
+
|
|
88
139
|
```javascript
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
}
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
},
|
|
101
|
-
prices: {
|
|
102
|
-
history: {
|
|
103
|
-
// Price data for all instruments (or batched shards)
|
|
104
|
-
"123": {
|
|
105
|
-
ticker: "AAPL",
|
|
106
|
-
prices: {
|
|
107
|
-
"2024-12-01": 150.25,
|
|
108
|
-
"2024-12-02": 151.30,
|
|
109
|
-
// ...
|
|
110
|
-
}
|
|
111
|
-
}
|
|
112
|
-
}
|
|
113
|
-
},
|
|
114
|
-
mappings: { /* Same as Standard */ },
|
|
115
|
-
math: { /* Same as Standard */ },
|
|
116
|
-
computed: { /* Same as Standard */ },
|
|
117
|
-
previousComputed: { /* Same as Standard */ },
|
|
118
|
-
meta: { /* Calculation metadata */ },
|
|
119
|
-
config: { /* System configuration */ },
|
|
120
|
-
deps: { /* System dependencies */ }
|
|
140
|
+
// AvailabilityChecker.js - checkRootDependencies()
|
|
141
|
+
|
|
142
|
+
if (calculation.rootDataDependencies includes 'portfolio') {
|
|
143
|
+
if (calculation.userType === 'speculator') {
|
|
144
|
+
if (!rootDataStatus.speculatorPortfolio) → MISSING
|
|
145
|
+
} else if (calculation.userType === 'normal') {
|
|
146
|
+
if (!rootDataStatus.normalPortfolio) → MISSING
|
|
147
|
+
} else {
|
|
148
|
+
// userType: 'all' or 'aggregate'
|
|
149
|
+
if (!rootDataStatus.hasPortfolio) → MISSING
|
|
150
|
+
}
|
|
121
151
|
}
|
|
152
|
+
|
|
153
|
+
// Similar logic for 'history' dependency
|
|
154
|
+
// Global data types (insights, social, price) have no subtypes
|
|
122
155
|
```
|
|
123
156
|
|
|
124
|
-
|
|
157
|
+
**Critical Behavior**:
|
|
158
|
+
- If data is missing for a **historical date** → Mark calculation as `IMPOSSIBLE` (permanent failure)
|
|
159
|
+
- If data is missing for **today's date** → Mark as `BLOCKED` (retriable, data may arrive later)
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Computation Architecture
|
|
125
164
|
|
|
126
|
-
|
|
165
|
+
### Calculation Types
|
|
166
|
+
|
|
167
|
+
#### **Standard (Per-User) Computations**
|
|
127
168
|
|
|
128
169
|
```javascript
|
|
129
|
-
class
|
|
170
|
+
class UserRiskProfile {
|
|
130
171
|
static getMetadata() {
|
|
131
172
|
return {
|
|
132
|
-
type: 'standard',
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
173
|
+
type: 'standard', // Runs once per user
|
|
174
|
+
category: 'risk-management', // Storage category
|
|
175
|
+
isHistorical: true, // Needs yesterday's data
|
|
176
|
+
rootDataDependencies: ['portfolio', 'history'],
|
|
177
|
+
userType: 'speculator' // Only for speculator users
|
|
136
178
|
};
|
|
137
179
|
}
|
|
138
180
|
|
|
139
181
|
static getDependencies() {
|
|
140
|
-
return ['
|
|
182
|
+
return ['market-volatility']; // Needs this calc to run first
|
|
183
|
+
}
|
|
184
|
+
|
|
185
|
+
async process(context) {
|
|
186
|
+
const { user, computed, math } = context;
|
|
187
|
+
// Process individual user
|
|
188
|
+
this.results[user.id] = { riskScore: /* ... */ };
|
|
141
189
|
}
|
|
142
190
|
}
|
|
143
191
|
```
|
|
144
192
|
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
3. **Checks `getDependencies()`** → Fetches results from other calculations
|
|
150
|
-
4. **Injects math layers** → Automatically includes all extractors, primitives, and utilities
|
|
151
|
-
5. **Adds mappings** → Provides ticker ↔ instrument ID conversion
|
|
152
|
-
|
|
153
|
-
**You only get what you ask for.** This keeps memory usage efficient and prevents unnecessary data loading.
|
|
193
|
+
**Execution Model**:
|
|
194
|
+
- Portfolio data streams in batches (50 users at a time)
|
|
195
|
+
- Each user processed independently
|
|
196
|
+
- Memory-efficient for millions of users
|
|
154
197
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
## Data Loading & Routing
|
|
158
|
-
|
|
159
|
-
### The Data Loading Pipeline
|
|
160
|
-
|
|
161
|
-
```
|
|
162
|
-
┌─────────────────────────────────────────────────────────────┐
|
|
163
|
-
│ DataLoader (Cached) │
|
|
164
|
-
├─────────────────────────────────────────────────────────────┤
|
|
165
|
-
│ • loadMappings() → Ticker/Instrument maps │
|
|
166
|
-
│ • loadInsights(date) → Daily instrument insights │
|
|
167
|
-
│ • loadSocial(date) → Social post insights │
|
|
168
|
-
│ • loadPriceShard(ref) → Asset price data │
|
|
169
|
-
│ • getPriceShardRefs() → All price shards │
|
|
170
|
-
│ • getSpecificPriceShardReferences(ids) → Targeted shards │
|
|
171
|
-
└─────────────────────────────────────────────────────────────┘
|
|
172
|
-
↓
|
|
173
|
-
┌─────────────────────────────────────────────────────────────┐
|
|
174
|
-
│ ComputationExecutor │
|
|
175
|
-
├─────────────────────────────────────────────────────────────┤
|
|
176
|
-
│ • executePerUser() → Streams portfolio data │
|
|
177
|
-
│ • executeOncePerDay() → Loads global/meta data │
|
|
178
|
-
└─────────────────────────────────────────────────────────────┘
|
|
179
|
-
↓
|
|
180
|
-
┌─────────────────────────────────────────────────────────────┐
|
|
181
|
-
│ ContextBuilder │
|
|
182
|
-
├─────────────────────────────────────────────────────────────┤
|
|
183
|
-
│ Assembles context based on metadata & dependencies │
|
|
184
|
-
└─────────────────────────────────────────────────────────────┘
|
|
185
|
-
↓
|
|
186
|
-
Your Calculation.process()
|
|
187
|
-
```
|
|
188
|
-
|
|
189
|
-
### Streaming vs Batch Loading
|
|
190
|
-
|
|
191
|
-
#### **Standard Computations: Streaming**
|
|
192
|
-
Standard (per-user) computations use **streaming** to process users in chunks:
|
|
198
|
+
#### **Meta (Once-Per-Day) Computations**
|
|
193
199
|
|
|
194
200
|
```javascript
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
+
class MarketMomentum {
|
|
202
|
+
static getMetadata() {
|
|
203
|
+
return {
|
|
204
|
+
type: 'meta', // Runs once per day globally
|
|
205
|
+
category: 'market-signals',
|
|
206
|
+
rootDataDependencies: ['price', 'insights']
|
|
207
|
+
};
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
async process(context) {
|
|
211
|
+
const { prices, insights, math } = context;
|
|
212
|
+
// Process all tickers
|
|
213
|
+
for (const [instId, data] of Object.entries(prices.history)) {
|
|
214
|
+
this.results[data.ticker] = { momentum: /* ... */ };
|
|
215
|
+
}
|
|
201
216
|
}
|
|
202
217
|
}
|
|
203
218
|
```
|
|
204
219
|
|
|
205
|
-
**
|
|
206
|
-
-
|
|
207
|
-
-
|
|
208
|
-
-
|
|
220
|
+
**Execution Model**:
|
|
221
|
+
- Loads all price data (or processes in shard batches)
|
|
222
|
+
- Runs once, produces global results
|
|
223
|
+
- Used for market-wide analytics
|
|
209
224
|
|
|
210
|
-
|
|
211
|
-
Meta computations have two modes:
|
|
225
|
+
### Manifest Builder
|
|
212
226
|
|
|
213
|
-
|
|
214
|
-
```javascript
|
|
215
|
-
const context = buildMetaContext({ insights, social, ... });
|
|
216
|
-
await calculation.process(context);
|
|
217
|
-
```
|
|
227
|
+
The Manifest Builder automatically:
|
|
218
228
|
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
const context = buildMetaContext({ prices: { history: shardData } });
|
|
225
|
-
await calculation.process(context);
|
|
226
|
-
// Memory is cleared between shards
|
|
227
|
-
}
|
|
228
|
-
```
|
|
229
|
+
1. **Discovers** all calculation classes in the codebase
|
|
230
|
+
2. **Analyzes** their dependencies
|
|
231
|
+
3. **Sorts** them topologically (builds a DAG)
|
|
232
|
+
4. **Assigns** pass numbers (execution waves)
|
|
233
|
+
5. **Generates** smart hashes for each calculation
|
|
229
234
|
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
235
|
+
```
|
|
236
|
+
Pass 1 (No Dependencies):
|
|
237
|
+
- market-volatility
|
|
238
|
+
- price-momentum
|
|
239
|
+
|
|
240
|
+
Pass 2 (Depends on Pass 1):
|
|
241
|
+
- user-risk-profile (needs market-volatility)
|
|
242
|
+
- sentiment-score (needs price-momentum)
|
|
243
|
+
|
|
244
|
+
Pass 3 (Depends on Pass 2):
|
|
245
|
+
- combined-signal (needs sentiment-score + user-risk-profile)
|
|
246
|
+
```
|
|
233
247
|
|
|
234
|
-
|
|
248
|
+
**Circular Dependency Detection**: If `A → B → C → A`, the builder throws a fatal error and refuses to generate a manifest.
|
|
235
249
|
|
|
236
|
-
|
|
250
|
+
---
|
|
237
251
|
|
|
238
|
-
|
|
252
|
+
## Context & Dependency Injection
|
|
239
253
|
|
|
240
|
-
|
|
254
|
+
### Context Structure
|
|
241
255
|
|
|
242
|
-
|
|
256
|
+
Every calculation receives exactly what it declares:
|
|
243
257
|
|
|
244
258
|
```javascript
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
}
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
//
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
//
|
|
263
|
-
|
|
264
|
-
|
|
259
|
+
{
|
|
260
|
+
// IDENTITY (Standard only)
|
|
261
|
+
user: {
|
|
262
|
+
id: "user_123",
|
|
263
|
+
type: "speculator",
|
|
264
|
+
portfolio: { today: {...}, yesterday: {...} },
|
|
265
|
+
history: { today: {...}, yesterday: {...} }
|
|
266
|
+
},
|
|
267
|
+
|
|
268
|
+
// TEMPORAL
|
|
269
|
+
date: { today: "2024-12-07" },
|
|
270
|
+
|
|
271
|
+
// ROOT DATA (if declared)
|
|
272
|
+
insights: { today: {...}, yesterday: {...} },
|
|
273
|
+
social: { today: {...}, yesterday: {...} },
|
|
274
|
+
prices: { history: {...} },
|
|
275
|
+
|
|
276
|
+
// MAPPINGS
|
|
277
|
+
mappings: {
|
|
278
|
+
tickerToInstrument: { "AAPL": 123 },
|
|
279
|
+
instrumentToTicker: { 123: "AAPL" }
|
|
280
|
+
},
|
|
281
|
+
|
|
282
|
+
// MATH LAYERS (always injected)
|
|
283
|
+
math: {
|
|
284
|
+
extract: DataExtractor,
|
|
285
|
+
compute: MathPrimitives,
|
|
286
|
+
signals: SignalPrimitives,
|
|
287
|
+
history: HistoryExtractor,
|
|
288
|
+
insights: InsightsExtractor,
|
|
289
|
+
priceExtractor: priceExtractor,
|
|
290
|
+
// ... 20+ utility classes
|
|
291
|
+
},
|
|
292
|
+
|
|
293
|
+
// DEPENDENCIES (if declared)
|
|
294
|
+
computed: {
|
|
295
|
+
"market-volatility": { "AAPL": { volatility: 0.25 } }
|
|
296
|
+
},
|
|
297
|
+
|
|
298
|
+
// HISTORICAL DEPENDENCIES (if isHistorical: true)
|
|
299
|
+
previousComputed: {
|
|
300
|
+
"market-volatility": { "AAPL": { volatility: 0.23 } }
|
|
301
|
+
}
|
|
302
|
+
}
|
|
265
303
|
```
|
|
266
304
|
|
|
267
|
-
###
|
|
305
|
+
### Lazy Loading Optimization
|
|
268
306
|
|
|
269
|
-
The system **
|
|
307
|
+
The system **only loads what you declare**:
|
|
270
308
|
|
|
271
309
|
```javascript
|
|
272
|
-
//
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
// System checks: Is this document sharded?
|
|
276
|
-
if (doc.data()._sharded === true) {
|
|
277
|
-
// 1. Fetch all docs from _shards subcollection
|
|
278
|
-
// 2. Merge them back into a single object
|
|
279
|
-
// 3. Return as if it was never sharded
|
|
280
|
-
}
|
|
310
|
+
// Calculation A declares:
|
|
311
|
+
rootDataDependencies: ['portfolio']
|
|
281
312
|
|
|
282
|
-
//
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
### Handling Mixed Storage Scenarios
|
|
313
|
+
// Context A receives:
|
|
314
|
+
{ user: { portfolio: {...} } } // No insights, social, or prices loaded
|
|
286
315
|
|
|
287
|
-
**Question:** What if I need data from 2 days, where Day 1 is sharded and Day 2 is not?
|
|
288
316
|
|
|
289
|
-
|
|
317
|
+
// Calculation B declares:
|
|
318
|
+
rootDataDependencies: ['portfolio', 'insights']
|
|
290
319
|
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
isHistorical: true, // I need yesterday's data
|
|
296
|
-
// ...
|
|
297
|
-
};
|
|
320
|
+
// Context B receives:
|
|
321
|
+
{
|
|
322
|
+
user: { portfolio: {...} },
|
|
323
|
+
insights: { today: {...} } // Insights fetched on-demand
|
|
298
324
|
}
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
This prevents unnecessary Firestore reads and keeps memory usage minimal.
|
|
299
328
|
|
|
300
|
-
|
|
301
|
-
const computed = await fetchExistingResults(todayDate, ['momentum-score']);
|
|
302
|
-
// → Auto-detects if sharded, reassembles if needed
|
|
329
|
+
---
|
|
303
330
|
|
|
304
|
-
|
|
305
|
-
|
|
331
|
+
## Execution Pipeline
|
|
332
|
+
|
|
333
|
+
### Phase 1: Analysis & Dispatch
|
|
334
|
+
|
|
335
|
+
```
|
|
336
|
+
┌─────────────────────────────────────────────────────┐
|
|
337
|
+
│ Computation Dispatcher │
|
|
338
|
+
│ (Smart Pre-Flight Checker) │
|
|
339
|
+
├─────────────────────────────────────────────────────┤
|
|
340
|
+
│ For each date in range: │
|
|
341
|
+
│ 1. Fetch Root Data Index │
|
|
342
|
+
│ 2. Fetch Computation Status (stored hashes) │
|
|
343
|
+
│ 3. Fetch Yesterday's Status (historical check) │
|
|
344
|
+
│ 4. Run analyzeDateExecution() │
|
|
345
|
+
│ │
|
|
346
|
+
│ Decision Logic per Calculation: │
|
|
347
|
+
│ ├─ Root Data Missing? │
|
|
348
|
+
│ │ ├─ Historical Date → Mark IMPOSSIBLE │
|
|
349
|
+
│ │ └─ Today's Date → Mark BLOCKED (retriable) │
|
|
350
|
+
│ │ │
|
|
351
|
+
│ ├─ Dependency Impossible? │
|
|
352
|
+
│ │ └─ Mark IMPOSSIBLE (cascading failure) │
|
|
353
|
+
│ │ │
|
|
354
|
+
│ ├─ Dependency Missing/Hash Mismatch? │
|
|
355
|
+
│ │ └─ Mark BLOCKED (wait for dependency) │
|
|
356
|
+
│ │ │
|
|
357
|
+
│ ├─ Historical Continuity Broken? │
|
|
358
|
+
│ │ └─ Mark BLOCKED (wait for yesterday) │
|
|
359
|
+
│ │ │
|
|
360
|
+
│ ├─ Hash Mismatch? │
|
|
361
|
+
│ │ └─ Mark RUNNABLE (re-run needed) │
|
|
362
|
+
│ │ │
|
|
363
|
+
│ └─ Hash Match? │
|
|
364
|
+
│ └─ Mark SKIPPED (up-to-date) │
|
|
365
|
+
│ │
|
|
366
|
+
│ 5. Create Audit Ledger (PENDING state) │
|
|
367
|
+
│ 6. Publish RUNNABLE tasks to Pub/Sub │
|
|
368
|
+
└─────────────────────────────────────────────────────┘
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
### Phase 2: Worker Execution
|
|
372
|
+
|
|
373
|
+
```
|
|
374
|
+
┌─────────────────────────────────────────────────────┐
|
|
375
|
+
│ Computation Worker │
|
|
376
|
+
│ (Processes Single Task) │
|
|
377
|
+
├─────────────────────────────────────────────────────┤
|
|
378
|
+
│ 1. Parse Pub/Sub Message │
|
|
379
|
+
│ { date, pass, computation, previousCategory } │
|
|
380
|
+
│ │
|
|
381
|
+
│ 2. Load Manifest (cached in memory) │
|
|
382
|
+
│ │
|
|
383
|
+
│ 3. Fetch Dependencies │
|
|
384
|
+
│ - Load dependency results (auto-reassemble) │
|
|
385
|
+
│ - Load previous day's results (if historical) │
|
|
386
|
+
│ │
|
|
387
|
+
│ 4. Execute Calculation │
|
|
388
|
+
│ ├─ Standard: Stream users in batches │
|
|
389
|
+
│ └─ Meta: Load global data / price shards │
|
|
390
|
+
│ │
|
|
391
|
+
│ 5. Validate Results (HeuristicValidator) │
|
|
392
|
+
│ - NaN Detection │
|
|
393
|
+
│ - Flatline Detection (stuck values) │
|
|
394
|
+
│ - Null/Empty Analysis │
|
|
395
|
+
│ - Dead Object Detection │
|
|
396
|
+
│ │
|
|
397
|
+
│ 6. Store Results │
|
|
398
|
+
│ ├─ Calculate size │
|
|
399
|
+
│ ├─ If > 900KB → Auto-shard │
|
|
400
|
+
│ └─ Write to Firestore │
|
|
401
|
+
│ │
|
|
402
|
+
│ 7. Update Status & Ledgers │
|
|
403
|
+
│ ├─ computation_status/{date} → New hash │
|
|
404
|
+
│ ├─ Audit Ledger → COMPLETED │
|
|
405
|
+
│ └─ Run History → SUCCESS/FAILURE record │
|
|
406
|
+
│ │
|
|
407
|
+
│ 8. Category Migration (if detected) │
|
|
408
|
+
│ └─ Delete old category's data │
|
|
409
|
+
└─────────────────────────────────────────────────────┘
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
### Error Handling Stages
|
|
413
|
+
|
|
414
|
+
The system tracks **where** failures occur:
|
|
306
415
|
|
|
307
|
-
|
|
416
|
+
```javascript
|
|
417
|
+
// Run History Schema
|
|
308
418
|
{
|
|
309
|
-
|
|
310
|
-
|
|
419
|
+
status: "FAILURE" | "SUCCESS" | "CRASH",
|
|
420
|
+
error: {
|
|
421
|
+
message: "...",
|
|
422
|
+
stage: "EXECUTION" | "PREPARE_SHARDS" | "COMMIT_BATCH" |
|
|
423
|
+
"SHARDING_LIMIT_EXCEEDED" | "QUALITY_CIRCUIT_BREAKER" |
|
|
424
|
+
"MANIFEST_LOAD" | "SYSTEM_CRASH"
|
|
425
|
+
}
|
|
311
426
|
}
|
|
312
427
|
```
|
|
313
428
|
|
|
314
|
-
|
|
429
|
+
**Stage-Specific Handling**:
|
|
430
|
+
- `QUALITY_CIRCUIT_BREAKER`: Block deployment, data integrity issue
|
|
431
|
+
- `SHARDING_LIMIT_EXCEEDED`: Firestore hard limit hit, needs redesign
|
|
432
|
+
- `SYSTEM_CRASH`: Infrastructure issue, retriable
|
|
433
|
+
- `EXECUTION`: Logic bug in calculation code
|
|
315
434
|
|
|
316
435
|
---
|
|
317
436
|
|
|
318
|
-
##
|
|
319
|
-
|
|
320
|
-
### Standard Computations (`type: 'standard'`)
|
|
321
|
-
|
|
322
|
-
**Purpose:** Per-user calculations (risk profiles, P&L analysis, behavioral scoring)
|
|
437
|
+
## Smart Hashing & Versioning
|
|
323
438
|
|
|
324
|
-
|
|
325
|
-
- Runs **once per user** per day
|
|
326
|
-
- Receives individual user portfolio and history
|
|
327
|
-
- Streams data in batches for memory efficiency
|
|
439
|
+
### Hash Composition
|
|
328
440
|
|
|
329
|
-
**Example:**
|
|
330
441
|
```javascript
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
const portfolio = user.portfolio.today;
|
|
343
|
-
const positions = math.extract.getPositions(portfolio, user.type);
|
|
344
|
-
|
|
345
|
-
// Calculate risk per user
|
|
346
|
-
this.results[user.id] = { riskScore: /* ... */ };
|
|
442
|
+
// Step 1: Intrinsic Hash (Code + System Epoch)
|
|
443
|
+
const codeHash = SHA256(calculation.toString());
|
|
444
|
+
const intrinsicHash = SHA256(codeHash + "|EPOCH:v1.0-epoch-2");
|
|
445
|
+
|
|
446
|
+
// Step 2: Layer Hashing (Dynamic Detection)
|
|
447
|
+
let compositeHash = intrinsicHash;
|
|
448
|
+
for (const [layer, exports] of MATH_LAYERS) {
|
|
449
|
+
for (const [exportName, triggerPatterns] of exports) {
|
|
450
|
+
if (codeString.includes(exportName)) {
|
|
451
|
+
compositeHash += layerHashes[layer][exportName];
|
|
452
|
+
}
|
|
347
453
|
}
|
|
348
454
|
}
|
|
455
|
+
|
|
456
|
+
// Step 3: Dependency Hashing (Merkle Tree)
|
|
457
|
+
const depHashes = dependencies.map(dep => dep.hash).join('|');
|
|
458
|
+
const finalHash = SHA256(compositeHash + "|DEPS:" + depHashes);
|
|
349
459
|
```
|
|
350
460
|
|
|
351
|
-
|
|
461
|
+
### Cascading Invalidation Example
|
|
462
|
+
|
|
463
|
+
```
|
|
464
|
+
Initial State:
|
|
465
|
+
PriceVolatility → hash: abc123
|
|
466
|
+
UserRisk (uses PV) → hash: def456 (includes abc123)
|
|
467
|
+
Signal (uses UR) → hash: ghi789 (includes def456)
|
|
468
|
+
|
|
469
|
+
Developer Updates PriceVolatility:
|
|
470
|
+
PriceVolatility → hash: xyz000 (NEW!)
|
|
471
|
+
UserRisk → hash: uvw111 (NEW! Dependency changed)
|
|
472
|
+
Signal → hash: rst222 (NEW! Cascade)
|
|
473
|
+
|
|
474
|
+
Next Dispatch:
|
|
475
|
+
All 3 calculations marked RUNNABLE (hash mismatch)
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### System Epoch
|
|
479
|
+
|
|
480
|
+
`system_epoch.js`:
|
|
352
481
|
```javascript
|
|
353
|
-
|
|
354
|
-
"user_123": { riskScore: 0.75 },
|
|
355
|
-
"user_456": { riskScore: 0.45 },
|
|
356
|
-
// ... millions of users
|
|
357
|
-
}
|
|
482
|
+
module.exports = "v1.0-epoch-2";
|
|
358
483
|
```
|
|
359
484
|
|
|
360
|
-
|
|
485
|
+
**Purpose**: Changing this string forces **global re-computation** of all calculations, even if code hasn't changed. Used for:
|
|
486
|
+
- Schema migrations
|
|
487
|
+
- Critical bug fixes requiring historical reprocessing
|
|
488
|
+
- Firestore structure changes
|
|
489
|
+
|
|
490
|
+
---
|
|
361
491
|
|
|
362
|
-
|
|
492
|
+
## Data Loading & Streaming
|
|
363
493
|
|
|
364
|
-
|
|
365
|
-
- Runs **once per day** (not per user)
|
|
366
|
-
- Processes all data holistically
|
|
367
|
-
- Can access price history for all instruments
|
|
494
|
+
### Streaming Architecture (Standard Computations)
|
|
368
495
|
|
|
369
|
-
**Example:**
|
|
370
496
|
```javascript
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
};
|
|
377
|
-
}
|
|
497
|
+
// Problem: 10M users × 5KB portfolio = 50GB
|
|
498
|
+
// Solution: Stream in chunks
|
|
499
|
+
|
|
500
|
+
async function* streamPortfolioData(dateStr, refs) {
|
|
501
|
+
const BATCH_SIZE = 50; // 50 users at a time
|
|
378
502
|
|
|
379
|
-
|
|
380
|
-
const
|
|
503
|
+
for (let i = 0; i < refs.length; i += BATCH_SIZE) {
|
|
504
|
+
const batchRefs = refs.slice(i, i + BATCH_SIZE);
|
|
505
|
+
const userData = await loadDataByRefs(batchRefs);
|
|
381
506
|
|
|
382
|
-
//
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
const priceData = math.priceExtractor.getHistory(prices, ticker);
|
|
386
|
-
|
|
387
|
-
this.results[ticker] = { momentum: /* ... */ };
|
|
388
|
-
}
|
|
507
|
+
yield userData; // { user1: {...}, user2: {...}, ... }
|
|
508
|
+
|
|
509
|
+
// Memory cleared after each iteration
|
|
389
510
|
}
|
|
390
511
|
}
|
|
391
|
-
```
|
|
392
512
|
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
{
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
513
|
+
// Usage in Executor
|
|
514
|
+
for await (const userBatch of streamPortfolioData(date, refs)) {
|
|
515
|
+
for (const [userId, portfolio] of Object.entries(userBatch)) {
|
|
516
|
+
const context = buildContext({ userId, portfolio, ... });
|
|
517
|
+
await calculation.process(context);
|
|
518
|
+
}
|
|
399
519
|
}
|
|
400
520
|
```
|
|
401
521
|
|
|
402
|
-
### Price
|
|
403
|
-
|
|
404
|
-
When a meta computation declares `rootDataDependencies: ['price']`, it enters **batched shard processing mode**:
|
|
522
|
+
### Price Data Batching (Meta Computations)
|
|
405
523
|
|
|
406
524
|
```javascript
|
|
407
|
-
//
|
|
525
|
+
// Problem: 10,000 tickers × 2 years history = 1GB
|
|
526
|
+
// Solution: Process shards sequentially
|
|
527
|
+
|
|
408
528
|
for (const shardRef of priceShardRefs) {
|
|
409
|
-
const shardData = await loadPriceShard(shardRef);
|
|
529
|
+
const shardData = await loadPriceShard(shardRef); // ~100 tickers
|
|
410
530
|
|
|
411
531
|
const context = buildMetaContext({
|
|
412
|
-
prices: { history: shardData }
|
|
532
|
+
prices: { history: shardData }
|
|
413
533
|
});
|
|
414
534
|
|
|
415
535
|
await calculation.process(context);
|
|
416
536
|
|
|
417
537
|
// Results accumulate across shards
|
|
418
|
-
// Memory
|
|
538
|
+
// Memory cleared between shards
|
|
419
539
|
}
|
|
420
540
|
```
|
|
421
541
|
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
---
|
|
425
|
-
|
|
426
|
-
## Dependency Management
|
|
542
|
+
### Smart Shard Indexing (Optimization)
|
|
427
543
|
|
|
428
|
-
|
|
544
|
+
For targeted price lookups (e.g., "only calculate momentum for AAPL, GOOGL, MSFT"):
|
|
429
545
|
|
|
430
546
|
```javascript
|
|
431
|
-
|
|
432
|
-
|
|
433
|
-
|
|
547
|
+
// Without Indexing: Load ALL shards, filter after
|
|
548
|
+
// Cost: 100+ Firestore reads
|
|
549
|
+
|
|
550
|
+
// With Indexing: Pre-map which shard contains each instrument
|
|
551
|
+
const index = {
|
|
552
|
+
"123": "shard_0", // AAPL
|
|
553
|
+
"456": "shard_2", // GOOGL
|
|
554
|
+
"789": "shard_0" // MSFT
|
|
555
|
+
};
|
|
556
|
+
|
|
557
|
+
const relevantShards = ["shard_0", "shard_2"];
|
|
558
|
+
// Cost: 2 Firestore reads
|
|
434
559
|
```
|
|
435
560
|
|
|
436
|
-
|
|
561
|
+
**Index Building**: Runs once, cached in `/system_metadata/price_shard_index`.
|
|
437
562
|
|
|
438
|
-
|
|
563
|
+
---
|
|
439
564
|
|
|
440
|
-
|
|
565
|
+
## Auto-Sharding System
|
|
441
566
|
|
|
442
|
-
|
|
443
|
-
2. Checks if data is sharded → reassembles if needed
|
|
444
|
-
3. Injects into `context.computed`:
|
|
567
|
+
### The 1MB Problem
|
|
445
568
|
|
|
446
|
-
|
|
447
|
-
{
|
|
448
|
-
computed: {
|
|
449
|
-
"risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
|
|
450
|
-
"sentiment-score": { "AAPL": { score: 0.8 }, ... },
|
|
451
|
-
"momentum-analysis": { "AAPL": { momentum: 0.65 }, ... }
|
|
452
|
-
}
|
|
453
|
-
}
|
|
454
|
-
```
|
|
569
|
+
Firestore's hard limit: **1MB per document**. A calculation producing results for 5,000 tickers easily exceeds this.
|
|
455
570
|
|
|
456
|
-
###
|
|
571
|
+
### Transparent Sharding Solution
|
|
457
572
|
|
|
458
573
|
```javascript
|
|
459
|
-
|
|
460
|
-
|
|
574
|
+
// ResultCommitter.js - prepareAutoShardedWrites()
|
|
575
|
+
|
|
576
|
+
const totalSize = calculateFirestoreBytes(result);
|
|
577
|
+
|
|
578
|
+
if (totalSize < 900KB) {
|
|
579
|
+
// Write normally
|
|
580
|
+
/results/{date}/{category}/{calc}
|
|
581
|
+
→ { "AAPL": {...}, "GOOGL": {...}, _completed: true }
|
|
461
582
|
|
|
462
|
-
|
|
463
|
-
|
|
464
|
-
|
|
465
|
-
|
|
466
|
-
'AAPL',
|
|
467
|
-
'volatility'
|
|
468
|
-
);
|
|
583
|
+
} else {
|
|
584
|
+
// Auto-shard
|
|
585
|
+
// Step 1: Split into chunks < 900KB each
|
|
586
|
+
const chunks = splitIntoChunks(result);
|
|
469
587
|
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
'score'
|
|
475
|
-
);
|
|
588
|
+
// Step 2: Write shards
|
|
589
|
+
/results/{date}/{category}/{calc}/_shards/shard_0 → chunk 1
|
|
590
|
+
/results/{date}/{category}/{calc}/_shards/shard_1 → chunk 2
|
|
591
|
+
/results/{date}/{category}/{calc}/_shards/shard_N → chunk N
|
|
476
592
|
|
|
477
|
-
//
|
|
478
|
-
|
|
593
|
+
// Step 3: Write pointer
|
|
594
|
+
/results/{date}/{category}/{calc}
|
|
595
|
+
→ { _sharded: true, _shardCount: N, _completed: true }
|
|
479
596
|
}
|
|
480
597
|
```
|
|
481
598
|
|
|
482
|
-
###
|
|
483
|
-
|
|
484
|
-
If your calculation needs **yesterday's dependency results**:
|
|
599
|
+
### Transparent Reassembly
|
|
485
600
|
|
|
486
601
|
```javascript
|
|
487
|
-
|
|
488
|
-
return {
|
|
489
|
-
isHistorical: true, // ← Enable historical mode
|
|
490
|
-
// ...
|
|
491
|
-
};
|
|
492
|
-
}
|
|
602
|
+
// DependencyFetcher.js - fetchExistingResults()
|
|
493
603
|
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
604
|
+
const doc = await docRef.get();
|
|
605
|
+
const data = doc.data();
|
|
606
|
+
|
|
607
|
+
if (data._sharded === true) {
|
|
608
|
+
// 1. Fetch all shards
|
|
609
|
+
const shardsCol = docRef.collection('_shards');
|
|
610
|
+
const snapshot = await shardsCol.get();
|
|
499
611
|
|
|
500
|
-
//
|
|
501
|
-
const
|
|
612
|
+
// 2. Merge back into single object
|
|
613
|
+
const assembled = {};
|
|
614
|
+
snapshot.forEach(shard => {
|
|
615
|
+
Object.assign(assembled, shard.data());
|
|
616
|
+
});
|
|
502
617
|
|
|
503
|
-
//
|
|
504
|
-
|
|
618
|
+
// 3. Return as if never sharded
|
|
619
|
+
return assembled;
|
|
505
620
|
}
|
|
621
|
+
|
|
622
|
+
// Normal path: return as-is
|
|
623
|
+
return data;
|
|
506
624
|
```
|
|
507
625
|
|
|
508
|
-
|
|
626
|
+
**Developer Experience**: You **never** know or care whether data is sharded. Read and write as if documents have no size limit.
|
|
627
|
+
|
|
628
|
+
### Sharding Limits
|
|
509
629
|
|
|
510
|
-
|
|
630
|
+
**Maximum Calculation Size**: ~450MB (500 shards × 900KB)
|
|
631
|
+
|
|
632
|
+
If a calculation exceeds this, the system throws:
|
|
633
|
+
```
|
|
634
|
+
error: {
|
|
635
|
+
stage: "SHARDING_LIMIT_EXCEEDED",
|
|
636
|
+
message: "Firestore subcollection limit reached"
|
|
637
|
+
}
|
|
638
|
+
```
|
|
511
639
|
|
|
512
|
-
|
|
640
|
+
**Solution**: Refactor calculation to produce less data or split into multiple calculations.
|
|
513
641
|
|
|
514
|
-
|
|
642
|
+
---
|
|
515
643
|
|
|
516
|
-
|
|
644
|
+
## Quality Assurance
|
|
517
645
|
|
|
518
|
-
|
|
646
|
+
### HeuristicValidator (Grey Box Testing)
|
|
519
647
|
|
|
520
|
-
|
|
521
|
-
2. **Layer dependencies** (Hashes of math layers it uses)
|
|
522
|
-
3. **Calculation dependencies** (Hashes of calculations it depends on)
|
|
648
|
+
Runs statistical analysis on results **before storage**:
|
|
523
649
|
|
|
524
650
|
```javascript
|
|
525
|
-
//
|
|
526
|
-
const intrinsicHash = hash(calculation.toString() + layerHashes);
|
|
527
|
-
const dependencyHashes = dependencies.map(dep => dep.hash).join('|');
|
|
528
|
-
const finalHash = hash(intrinsicHash + '|DEPS:' + dependencyHashes);
|
|
651
|
+
// ResultsValidator.js
|
|
529
652
|
|
|
530
|
-
|
|
531
|
-
|
|
653
|
+
1. NaN Detection
|
|
654
|
+
- Scans sample of results for NaN/Infinity
|
|
655
|
+
- Threshold: 0% (strict, NaN is always a bug)
|
|
532
656
|
|
|
533
|
-
|
|
657
|
+
2. Flatline Detection
|
|
658
|
+
- Checks if >95% of values are identical
|
|
659
|
+
- Catches stuck loops or broken RNG
|
|
534
660
|
|
|
535
|
-
|
|
661
|
+
3. Null/Empty Analysis
|
|
662
|
+
- Threshold: 90% of results are null/0
|
|
663
|
+
- Indicates data pipeline failure
|
|
536
664
|
|
|
665
|
+
4. Dead Object Detection
|
|
666
|
+
- Finds objects where all properties are null/0
|
|
667
|
+
- Example: { profile: [], score: 0, signal: null }
|
|
668
|
+
|
|
669
|
+
5. Vector Emptiness (Distribution Calcs)
|
|
670
|
+
- Checks if histogram/profile arrays are empty
|
|
671
|
+
- Threshold: 90% empty → FAIL
|
|
537
672
|
```
|
|
538
|
-
Risk Metrics (v1) → hash: abc123
|
|
539
|
-
↓
|
|
540
|
-
Sentiment Score → hash: def456 (includes abc123)
|
|
541
|
-
(depends on Risk)
|
|
542
|
-
```
|
|
543
673
|
|
|
544
|
-
If
|
|
674
|
+
**Circuit Breaker**: If validation fails, the calculation **does not store results** and is marked as `FAILURE` with stage: `QUALITY_CIRCUIT_BREAKER`.
|
|
675
|
+
|
|
676
|
+
### Validation Overrides
|
|
677
|
+
|
|
678
|
+
For legitimately sparse datasets:
|
|
545
679
|
|
|
680
|
+
```javascript
|
|
681
|
+
// validation_overrides.js
|
|
682
|
+
module.exports = {
|
|
683
|
+
"bankruptcy-detector": {
|
|
684
|
+
maxZeroPct: 100 // Rare event, 100% zeros is expected
|
|
685
|
+
},
|
|
686
|
+
"earnings-surprise": {
|
|
687
|
+
maxNullPct: 99 // Only runs on earnings days
|
|
688
|
+
}
|
|
689
|
+
};
|
|
546
690
|
```
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
691
|
+
|
|
692
|
+
### Build Reporter (Pre-Deployment Analysis)
|
|
693
|
+
|
|
694
|
+
```bash
|
|
695
|
+
npm run build-reporter
|
|
550
696
|
```
|
|
551
697
|
|
|
552
|
-
|
|
698
|
+
Generates a **simulation report** without running calculations:
|
|
553
699
|
|
|
554
|
-
|
|
700
|
+
```
|
|
701
|
+
Build Report: v1.2.5_2024-12-07
|
|
702
|
+
================================
|
|
555
703
|
|
|
556
|
-
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
};
|
|
704
|
+
Summary:
|
|
705
|
+
- 1,245 Re-Runs (hash mismatch)
|
|
706
|
+
- 23 New Calculations
|
|
707
|
+
- 0 Impossible
|
|
708
|
+
- 45 Blocked (waiting for data)
|
|
562
709
|
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
|
|
710
|
+
Detailed Breakdown:
|
|
711
|
+
|
|
712
|
+
2024-12-01:
|
|
713
|
+
Will Re-Run:
|
|
714
|
+
- user-risk-profile (Hash: abc123 → xyz789)
|
|
715
|
+
- sentiment-score (Hash: def456 → uvw012)
|
|
716
|
+
|
|
717
|
+
Blocked:
|
|
718
|
+
- social-sentiment (Missing Root Data: social)
|
|
566
719
|
|
|
567
|
-
|
|
568
|
-
|
|
569
|
-
|
|
720
|
+
2024-12-02:
|
|
721
|
+
Will Run:
|
|
722
|
+
- new-momentum-signal (New calculation)
|
|
570
723
|
```
|
|
571
724
|
|
|
572
|
-
|
|
725
|
+
**Use Case**: Review before deploying to production. If 10,000 re-runs are detected, investigate whether code change was intentional.
|
|
573
726
|
|
|
574
727
|
---
|
|
575
728
|
|
|
576
|
-
##
|
|
729
|
+
## Operational Modes
|
|
577
730
|
|
|
578
|
-
### Mode 1:
|
|
579
|
-
|
|
580
|
-
**Single-process execution** for all dates and calculations.
|
|
731
|
+
### Mode 1: Local Orchestrator (Development)
|
|
581
732
|
|
|
582
733
|
```bash
|
|
734
|
+
# Run all calculations for Pass 1 sequentially
|
|
583
735
|
COMPUTATION_PASS_TO_RUN=1 npm run computation-orchestrator
|
|
584
736
|
```
|
|
585
737
|
|
|
738
|
+
**Behavior**:
|
|
739
|
+
- Single-process execution
|
|
586
740
|
- Loads manifest
|
|
587
741
|
- Iterates through all dates
|
|
588
|
-
- Runs
|
|
589
|
-
- Good for:
|
|
742
|
+
- Runs calculations in order
|
|
743
|
+
- Good for: Debugging, local testing
|
|
590
744
|
|
|
591
745
|
### Mode 2: Dispatcher + Workers (Production)
|
|
592
746
|
|
|
593
|
-
**Distributed execution** using Pub/Sub.
|
|
594
|
-
|
|
595
|
-
#### Step 1: Dispatch Tasks
|
|
596
747
|
```bash
|
|
748
|
+
# Step 1: Dispatch tasks to Pub/Sub
|
|
597
749
|
COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher
|
|
598
|
-
```
|
|
599
750
|
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
{
|
|
603
|
-
"action": "RUN_COMPUTATION_DATE",
|
|
604
|
-
"date": "2024-12-07",
|
|
605
|
-
"pass": "1"
|
|
606
|
-
}
|
|
751
|
+
# Step 2: Cloud Function workers consume tasks
|
|
752
|
+
# (Auto-scaled by GCP, 0 to 1000+ workers)
|
|
607
753
|
```
|
|
608
754
|
|
|
609
|
-
|
|
755
|
+
**Behavior**:
|
|
756
|
+
- Dispatcher analyzes all dates
|
|
757
|
+
- Publishes ~10,000 messages to Pub/Sub
|
|
758
|
+
- Workers process in parallel
|
|
759
|
+
- Each worker handles 1 date
|
|
760
|
+
- Auto-retries on failure (Pub/Sub built-in)
|
|
761
|
+
|
|
762
|
+
**Scaling**: 1,000 dates × 3 calcs = 3,000 tasks. With 100 workers, completes in ~5 minutes.
|
|
763
|
+
|
|
764
|
+
### Mode 3: Batch Price Executor (Optimization)
|
|
765
|
+
|
|
610
766
|
```bash
|
|
611
|
-
#
|
|
612
|
-
|
|
613
|
-
npm run computation-worker
|
|
767
|
+
# For price-dependent calcs, bulk-process historical data
|
|
768
|
+
npm run batch-price-executor --dates=2024-12-01,2024-12-02 --calcs=momentum-signal
|
|
614
769
|
```
|
|
615
770
|
|
|
616
|
-
|
|
617
|
-
|
|
618
|
-
|
|
619
|
-
|
|
620
|
-
|
|
771
|
+
**Behavior**:
|
|
772
|
+
- Loads price shards once
|
|
773
|
+
- Processes multiple dates in a single pass
|
|
774
|
+
- Bypasses Pub/Sub overhead
|
|
775
|
+
- **10x faster** for historical backfills
|
|
621
776
|
|
|
622
|
-
**
|
|
623
|
-
- Parallel execution (100+ workers)
|
|
624
|
-
- Fault tolerance (failed dates retry automatically)
|
|
625
|
-
- Scales to millions of dates
|
|
777
|
+
**Use Case**: After deploying a new price-dependent calculation, backfill 2 years of history in 1 hour instead of 10.
|
|
626
778
|
|
|
627
|
-
|
|
779
|
+
---
|
|
628
780
|
|
|
629
|
-
|
|
781
|
+
## Advanced Topics
|
|
630
782
|
|
|
631
|
-
|
|
632
|
-
Pass 1: Base calculations (no dependencies)
|
|
633
|
-
- risk-metrics
|
|
634
|
-
- price-momentum
|
|
783
|
+
### Historical Continuity Enforcement
|
|
635
784
|
|
|
636
|
-
|
|
637
|
-
- sentiment-score (needs risk-metrics)
|
|
638
|
-
- trend-analysis (needs price-momentum)
|
|
785
|
+
For calculations that depend on their own previous results:
|
|
639
786
|
|
|
640
|
-
|
|
641
|
-
|
|
787
|
+
```javascript
|
|
788
|
+
// Example: cumulative-pnl needs yesterday's cumulative-pnl
|
|
789
|
+
|
|
790
|
+
static getMetadata() {
|
|
791
|
+
return { isHistorical: true };
|
|
792
|
+
}
|
|
793
|
+
|
|
794
|
+
// Dispatcher Logic:
|
|
795
|
+
if (calculation.isHistorical) {
|
|
796
|
+
const yesterday = date - 1 day;
|
|
797
|
+
const yesterdayStatus = await fetchComputationStatus(yesterday);
|
|
798
|
+
|
|
799
|
+
if (!yesterdayStatus[calcName] ||
|
|
800
|
+
yesterdayStatus[calcName].hash !== currentHash) {
|
|
801
|
+
// Yesterday is missing or has wrong hash
|
|
802
|
+
report.blocked.push({
|
|
803
|
+
reason: "Waiting for historical continuity"
|
|
804
|
+
});
|
|
805
|
+
}
|
|
806
|
+
}
|
|
642
807
|
```
|
|
643
808
|
|
|
644
|
-
**
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
|
|
648
|
-
|
|
809
|
+
**Result**: Historical calculations run in **strict chronological order**, never skipping days.
|
|
810
|
+
|
|
811
|
+
### Category Migration System
|
|
812
|
+
|
|
813
|
+
If a calculation's category changes:
|
|
814
|
+
|
|
815
|
+
```javascript
|
|
816
|
+
// Before: category: 'signals'
|
|
817
|
+
// After: category: 'risk-management'
|
|
818
|
+
|
|
819
|
+
// System detects change:
|
|
820
|
+
manifest.previousCategory = 'signals';
|
|
821
|
+
|
|
822
|
+
// Worker executes:
|
|
823
|
+
1. Runs calculation normally
|
|
824
|
+
2. Stores in new category: /results/{date}/risk-management/{calc}
|
|
825
|
+
3. Deletes old category: /results/{date}/signals/{calc}
|
|
649
826
|
```
|
|
650
827
|
|
|
651
|
-
|
|
828
|
+
**Automation**: Zero manual data migration needed.
|
|
829
|
+
|
|
830
|
+
### Audit Ledger vs Run History
|
|
831
|
+
|
|
832
|
+
**Audit Ledger** (`computation_audit_ledger/{date}/passes/{pass}/tasks/{calc}`):
|
|
833
|
+
- Created **before** dispatch
|
|
834
|
+
- Status: PENDING → COMPLETED
|
|
835
|
+
- Purpose: Track which tasks were dispatched
|
|
836
|
+
|
|
837
|
+
**Run History** (`computation_run_history/{date}/runs/{runId}`):
|
|
838
|
+
- Created **after** execution attempt
|
|
839
|
+
- Status: SUCCESS | FAILURE | CRASH
|
|
840
|
+
- Purpose: Debug failures, track performance
|
|
841
|
+
|
|
842
|
+
**Why Both?**: Audit Ledger answers "What should run?", Run History answers "What actually happened?".
|
|
652
843
|
|
|
653
844
|
---
|
|
654
845
|
|
|
@@ -657,60 +848,78 @@ The manifest builder automatically assigns pass numbers via topological sort.
|
|
|
657
848
|
### For a Standard Calculation
|
|
658
849
|
|
|
659
850
|
```
|
|
660
|
-
1.
|
|
661
|
-
|
|
662
|
-
|
|
663
|
-
├─ Assigns to a pass based on dependency graph
|
|
664
|
-
└─ Validates all dependencies exist
|
|
851
|
+
1. Root Data Indexer (Daily)
|
|
852
|
+
└─ Scans all data sources
|
|
853
|
+
└─ Creates availability manifest
|
|
665
854
|
|
|
666
|
-
2. Dispatcher
|
|
855
|
+
2. Dispatcher (Per-Pass)
|
|
667
856
|
├─ Loads manifest
|
|
668
|
-
├─
|
|
669
|
-
|
|
670
|
-
|
|
671
|
-
|
|
672
|
-
|
|
673
|
-
|
|
674
|
-
|
|
675
|
-
|
|
676
|
-
|
|
857
|
+
├─ For each date:
|
|
858
|
+
│ ├─ Checks root data availability
|
|
859
|
+
│ ├─ Checks dependency status
|
|
860
|
+
│ ├─ Checks historical continuity
|
|
861
|
+
│ └─ Decides: RUNNABLE | BLOCKED | IMPOSSIBLE
|
|
862
|
+
├─ Creates Audit Ledger (PENDING)
|
|
863
|
+
└─ Publishes RUNNABLE tasks to Pub/Sub
|
|
864
|
+
|
|
865
|
+
3. Worker (Per-Task)
|
|
866
|
+
├─ Receives {date, pass, computation}
|
|
867
|
+
├─ Loads manifest (cached)
|
|
868
|
+
├─ Fetches dependencies (auto-reassembles shards)
|
|
677
869
|
├─ Streams portfolio data in batches
|
|
678
|
-
|
|
679
|
-
|
|
680
|
-
|
|
681
|
-
|
|
682
|
-
|
|
683
|
-
|
|
684
|
-
|
|
685
|
-
├─
|
|
686
|
-
|
|
687
|
-
|
|
688
|
-
|
|
689
|
-
|
|
690
|
-
└─ ELSE:
|
|
691
|
-
└─ Writes single document
|
|
692
|
-
|
|
693
|
-
5. Status Updater
|
|
694
|
-
└─ Updates computation_status/{date} with new hash
|
|
870
|
+
├─ For each user:
|
|
871
|
+
│ ├─ Builds context (dependency injection)
|
|
872
|
+
│ └─ Calls calculation.process(context)
|
|
873
|
+
├─ Validates results (HeuristicValidator)
|
|
874
|
+
├─ Auto-shards if > 900KB
|
|
875
|
+
├─ Commits to Firestore
|
|
876
|
+
├─ Updates status hash
|
|
877
|
+
├─ Updates Audit Ledger → COMPLETED
|
|
878
|
+
└─ Records Run History → SUCCESS
|
|
879
|
+
|
|
880
|
+
4. Next Pass
|
|
881
|
+
└─ Depends on results from this pass
|
|
695
882
|
```
|
|
696
883
|
|
|
697
884
|
### For a Meta Calculation
|
|
698
885
|
|
|
699
|
-
Same
|
|
700
|
-
|
|
701
|
-
- **
|
|
702
|
-
- **
|
|
703
|
-
- **Result**: One document per date (e.g., all tickers' momentum scores)
|
|
886
|
+
Same flow, except:
|
|
887
|
+
- **Step 3**: Loads global data instead of streaming users
|
|
888
|
+
- **Context**: No user object, prices/insights instead
|
|
889
|
+
- **Result**: One document with all tickers' data
|
|
704
890
|
|
|
705
891
|
---
|
|
706
892
|
|
|
707
893
|
## Key Takeaways
|
|
708
894
|
|
|
709
|
-
1. **
|
|
710
|
-
2. **
|
|
711
|
-
3. **
|
|
712
|
-
4. **
|
|
713
|
-
5. **
|
|
714
|
-
6. **
|
|
895
|
+
1. **Data Availability Gates Everything**: Computations never run when source data is missing
|
|
896
|
+
2. **Smart Hashing Enables Incremental Updates**: Only changed calculations re-run
|
|
897
|
+
3. **Sharding is Invisible**: Read/write as if documents have no size limit
|
|
898
|
+
4. **Streaming Handles Scale**: Process millions of users without OOM
|
|
899
|
+
5. **Quality Checks Prevent Bad Data**: Results validated before storage
|
|
900
|
+
6. **Historical Continuity is Enforced**: Time-series calculations run in order
|
|
901
|
+
7. **Distributed Execution Scales Infinitely**: 1 worker or 1,000 workers, same code
|
|
715
902
|
|
|
716
903
|
---
|
|
904
|
+
|
|
905
|
+
## Operational Checklist
|
|
906
|
+
|
|
907
|
+
**Daily (Automated)**:
|
|
908
|
+
- ✅ Root Data Indexer runs at 2 AM UTC
|
|
909
|
+
- ✅ Computation Dispatchers run for each pass (3 AM, 4 AM, 5 AM)
|
|
910
|
+
- ✅ Workers auto-scale based on Pub/Sub queue depth
|
|
911
|
+
|
|
912
|
+
**After Code Changes**:
|
|
913
|
+
1. Run Build Reporter to preview impact
|
|
914
|
+
2. Review re-run count (expected vs actual)
|
|
915
|
+
3. Deploy to staging, run single date
|
|
916
|
+
4. Validate results in Firestore
|
|
917
|
+
5. Deploy to production
|
|
918
|
+
6. Monitor Run History for failures
|
|
919
|
+
|
|
920
|
+
**Debugging a Failure**:
|
|
921
|
+
1. Check Run History for error stage
|
|
922
|
+
2. If `QUALITY_CIRCUIT_BREAKER`: Data integrity issue, review validator logs
|
|
923
|
+
3. If `EXECUTION`: Logic bug, reproduce locally with Orchestrator mode
|
|
924
|
+
4. If `SYSTEM_CRASH`: Infrastructure issue, check Cloud Function logs
|
|
925
|
+
5. Fix bug, redeploy, re-trigger specific pass
|