bulltrackers-module 1.0.259 → 1.0.260

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,654 +1,845 @@
1
- # BullTrackers Computation System - Comprehensive Onboarding Guide
1
+ # BullTrackers Computation System & Root Data Indexer
2
2
 
3
3
  ## Table of Contents
4
- 1. [System Overview](#system-overview)
5
- 2. [Context Architecture](#context-architecture)
6
- 3. [Data Loading & Routing](#data-loading--routing)
7
- 4. [Sharding System](#sharding-system)
8
- 5. [Computation Types & Execution](#computation-types--execution)
9
- 6. [Dependency Management](#dependency-management)
10
- 7. [Versioning & Smart Hashing](#versioning--smart-hashing)
11
- 8. [Execution Modes](#execution-modes)
4
+ 1. [System Philosophy](#system-philosophy)
5
+ 2. [Root Data Indexer](#root-data-indexer)
6
+ 3. [Computation Architecture](#computation-architecture)
7
+ 4. [Context & Dependency Injection](#context--dependency-injection)
8
+ 5. [Execution Pipeline](#execution-pipeline)
9
+ 6. [Smart Hashing & Versioning](#smart-hashing--versioning)
10
+ 7. [Data Loading & Streaming](#data-loading--streaming)
11
+ 8. [Auto-Sharding System](#auto-sharding-system)
12
+ 9. [Quality Assurance](#quality-assurance)
13
+ 10. [Operational Modes](#operational-modes)
12
14
 
13
15
  ---
14
16
 
15
- ## System Overview
17
+ ## System Philosophy
16
18
 
17
- The BullTrackers Computation System is a **dependency-aware, auto-sharding, distributed calculation engine** designed to process massive datasets across user portfolios, trading histories, market insights, and price data. The system automatically handles:
19
+ The BullTrackers Computation System is a **dependency-aware, distributed calculation engine** that processes massive financial datasets with strict guarantees:
18
20
 
19
- - **Smart data loading** (only loads what's needed)
20
- - **Transparent sharding** (handles Firestore's 1MB document limit)
21
- - **Dependency injection** (calculations receive exactly what they declare)
22
- - **Historical state management** (access to yesterday's data when needed)
23
- - **Incremental recomputation** (only reruns when code or dependencies change)
21
+ - **Incremental Recomputation**: Only re-runs when code or dependencies change
22
+ - **Historical Continuity**: Ensures chronological execution for time-series calculations
23
+ - **Transparent Sharding**: Handles Firestore's 1MB document limit automatically
24
+ - **Data Availability Gating**: Never runs when source data is missing
25
+ - **Cascading Invalidation**: Upstream changes automatically invalidate downstream results
26
+
27
+ ### Key Design Principles
28
+
29
+ **1. Source of Truth Paradigm**
30
+ - The Root Data Indexer creates a daily availability manifest
31
+ - Computations are gated by data availability checks
32
+ - Missing data triggers "IMPOSSIBLE" states, not retries
33
+
34
+ **2. Merkle Tree Dependency Hashing**
35
+ - Every calculation has a hash that includes:
36
+ - Its own source code
37
+ - Hashes of math layers it uses
38
+ - Hashes of calculations it depends on
39
+ - Changes cascade: updating calculation A invalidates all dependents
40
+
41
+ **3. Stateless Execution**
42
+ - Each worker receives complete context for its task
43
+ - No inter-worker communication
44
+ - Infinitely horizontally scalable
24
45
 
25
46
  ---
26
47
 
27
- ## Context Architecture
48
+ ## Root Data Indexer
49
+
50
+ ### Purpose
51
+
52
+ The Root Data Indexer runs daily to scan all data sources and create a **centralized availability manifest**. This prevents the computation system from attempting to run calculations when source data doesn't exist.
53
+
54
+ ### Architecture
55
+
56
+ ```
57
+ ┌──────────────────────────────────────────────────────┐
58
+ │ Root Data Indexer (Daily Scan) │
59
+ ├──────────────────────────────────────────────────────┤
60
+ │ For each date (2023-01-01 → Tomorrow): │
61
+ │ 1. Check Normal User Portfolios (Canary: 19M) │
62
+ │ 2. Check Speculator Portfolios (Canary: 19M) │
63
+ │ 3. Check Normal Trade History (Canary: 19M) │
64
+ │ 4. Check Speculator Trade History (Canary: 19M) │
65
+ │ 5. Check Daily Insights (/{date}) │
66
+ │ 6. Check Social Posts (/{date}/posts) │
67
+ │ 7. Pre-Load Price Shard (shard_0) → Date Map │
68
+ │ │
69
+ │ Output: /system_root_data_index/{date} │
70
+ │ { │
71
+ │ hasPortfolio: true, │
72
+ │ hasHistory: false, │
73
+ │ hasInsights: true, │
74
+ │ hasSocial: true, │
75
+ │ hasPrices: true, │
76
+ │ details: { │
77
+ │ normalPortfolio: true, │
78
+ │ speculatorPortfolio: false, │
79
+ │ normalHistory: false, │
80
+ │ speculatorHistory: false │
81
+ │ } │
82
+ │ } │
83
+ └──────────────────────────────────────────────────────┘
84
+ ```
85
+
86
+ ### Canary Block System
87
+
88
+ Instead of scanning every user block (which would be prohibitively expensive), the indexer uses **representative blocks**:
89
+
90
+ - **Block 19M**: Statistically verified to always contain data when the system is healthy
91
+ - **Part 0**: First part of sharded collections
92
+
93
+ **Logic**: If Block 19M has data for a date, all blocks have data for that date. This is a deliberate architectural assumption that reduces indexing cost by 99%.
94
+
95
+ ### Granular UserType Tracking
96
+
97
+ The indexer distinguishes between:
98
+ - **Normal Users**: Traditional portfolio tracking (AggregatedPositions)
99
+ - **Speculators**: Advanced traders (PublicPositions with leverage/SL/TP)
28
100
 
29
- ### The Context Object
101
+ This enables calculations to declare: `userType: 'speculator'` and be gated only when speculator data exists.
30
102
 
31
- Every computation receives a **context object** that contains all the data and tools it needs. The context is built dynamically based on the computation's declared dependencies.
103
+ ### Price Data Optimization
32
104
 
33
- ### Context Structure by Computation Type
105
+ Price data is handled differently:
106
+
107
+ 1. **Pre-Load Once**: The entire `shard_0` document is loaded into memory
108
+ 2. **Extract Date Keys**: All dates with price data are extracted into a `Set`
109
+ 3. **Fast Lookup**: Each date check becomes O(1) instead of a Firestore read
110
+
111
+ This reduces price availability checks from **~1000 reads/day** to **1 read total**.
112
+
113
+ ### Index Schema
34
114
 
35
- #### **Standard (Per-User) Context**
36
115
  ```javascript
37
116
  {
38
- user: {
39
- id: "user_123",
40
- type: "speculator", // or "normal"
41
- portfolio: {
42
- today: { /* Portfolio snapshot for today */ },
43
- yesterday: { /* Portfolio snapshot for yesterday (if isHistorical: true) */ }
44
- },
45
- history: {
46
- today: { /* Trading history for today */ },
47
- yesterday: { /* Trading history for yesterday (if needed) */ }
48
- }
49
- },
50
- date: {
51
- today: "2024-12-07"
52
- },
53
- insights: {
54
- today: { /* Daily instrument insights */ },
55
- yesterday: { /* Yesterday's insights (if needed) */ }
56
- },
57
- social: {
58
- today: { /* Social post insights */ },
59
- yesterday: { /* Yesterday's social data (if needed) */ }
60
- },
61
- mappings: {
62
- tickerToInstrument: { "AAPL": 123, ... },
63
- instrumentToTicker: { 123: "AAPL", ... }
64
- },
65
- math: {
66
- // All mathematical layers (extractors, primitives, signals, etc.)
67
- extract: DataExtractor,
68
- compute: MathPrimitives,
69
- signals: SignalPrimitives,
70
- // ... and more
71
- },
72
- computed: {
73
- // Results from dependency calculations (current day)
74
- "risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
75
- "sentiment-score": { "AAPL": { score: 0.8 }, ... }
76
- },
77
- previousComputed: {
78
- // Results from dependency calculations (previous day, if isHistorical: true)
79
- "risk-metrics": { "AAPL": { volatility: 0.23 }, ... }
80
- },
81
- meta: { /* Calculation metadata */ },
82
- config: { /* System configuration */ },
83
- deps: { /* System dependencies (db, logger, etc.) */ }
117
+ date: "2024-12-07",
118
+ lastUpdated: Timestamp,
119
+
120
+ // Aggregate Flags (true if ANY subtype exists)
121
+ hasPortfolio: true, // normalPortfolio OR speculatorPortfolio
122
+ hasHistory: false, // normalHistory OR speculatorHistory
123
+ hasInsights: true, // Insights document exists
124
+ hasSocial: true, // At least 1 social post exists
125
+ hasPrices: true, // Price data exists for this date
126
+
127
+ // Granular Breakdown
128
+ details: {
129
+ normalPortfolio: true,
130
+ speculatorPortfolio: false,
131
+ normalHistory: false,
132
+ speculatorHistory: false
133
+ }
84
134
  }
85
135
  ```
86
136
 
87
- #### **Meta (Once-Per-Day) Context**
137
+ ### Availability Check Logic (Computation System)
138
+
88
139
  ```javascript
89
- {
90
- date: {
91
- today: "2024-12-07"
92
- },
93
- insights: {
94
- today: { /* Daily instrument insights */ },
95
- yesterday: { /* If needed */ }
96
- },
97
- social: {
98
- today: { /* Social post insights */ },
99
- yesterday: { /* If needed */ }
100
- },
101
- prices: {
102
- history: {
103
- // Price data for all instruments (or batched shards)
104
- "123": {
105
- ticker: "AAPL",
106
- prices: {
107
- "2024-12-01": 150.25,
108
- "2024-12-02": 151.30,
109
- // ...
110
- }
111
- }
112
- }
113
- },
114
- mappings: { /* Same as Standard */ },
115
- math: { /* Same as Standard */ },
116
- computed: { /* Same as Standard */ },
117
- previousComputed: { /* Same as Standard */ },
118
- meta: { /* Calculation metadata */ },
119
- config: { /* System configuration */ },
120
- deps: { /* System dependencies */ }
140
+ // AvailabilityChecker.js - checkRootDependencies()
141
+
142
+ if (calculation.rootDataDependencies includes 'portfolio') {
143
+ if (calculation.userType === 'speculator') {
144
+ if (!rootDataStatus.speculatorPortfolio) → MISSING
145
+ } else if (calculation.userType === 'normal') {
146
+ if (!rootDataStatus.normalPortfolio) MISSING
147
+ } else {
148
+ // userType: 'all' or 'aggregate'
149
+ if (!rootDataStatus.hasPortfolio) MISSING
150
+ }
121
151
  }
152
+
153
+ // Similar logic for 'history' dependency
154
+ // Global data types (insights, social, price) have no subtypes
122
155
  ```
123
156
 
124
- ### How Context is Auto-Populated
157
+ **Critical Behavior**:
158
+ - If data is missing for a **historical date** → Mark calculation as `IMPOSSIBLE` (permanent failure)
159
+ - If data is missing for **today's date** → Mark as `BLOCKED` (retriable, data may arrive later)
160
+
161
+ ---
162
+
163
+ ## Computation Architecture
125
164
 
126
- The system uses a **declaration-based approach**. When you define a calculation, you declare what data you need:
165
+ ### Calculation Types
166
+
167
+ #### **Standard (Per-User) Computations**
127
168
 
128
169
  ```javascript
129
- class MyCalculation {
170
+ class UserRiskProfile {
130
171
  static getMetadata() {
131
172
  return {
132
- type: 'standard', // or 'meta'
133
- isHistorical: true, // Do I need yesterday's data?
134
- rootDataDependencies: ['portfolio', 'insights'], // What root data do I need?
135
- userType: 'all' // 'all', 'speculator', or 'normal'
173
+ type: 'standard', // Runs once per user
174
+ category: 'risk-management', // Storage category
175
+ isHistorical: true, // Needs yesterday's data
176
+ rootDataDependencies: ['portfolio', 'history'],
177
+ userType: 'speculator' // Only for speculator users
136
178
  };
137
179
  }
138
180
 
139
181
  static getDependencies() {
140
- return ['risk-metrics', 'sentiment-score']; // What other calculations do I depend on?
182
+ return ['market-volatility']; // Needs this calc to run first
183
+ }
184
+
185
+ async process(context) {
186
+ const { user, computed, math } = context;
187
+ // Process individual user
188
+ this.results[user.id] = { riskScore: /* ... */ };
141
189
  }
142
190
  }
143
191
  ```
144
192
 
145
- The `ContextBuilder` then:
146
-
147
- 1. **Checks `rootDataDependencies`** Loads portfolio, insights, social, history, or price data
148
- 2. **Checks `isHistorical`** If true, loads yesterday's portfolio and previous computation results
149
- 3. **Checks `getDependencies()`** → Fetches results from other calculations
150
- 4. **Injects math layers** → Automatically includes all extractors, primitives, and utilities
151
- 5. **Adds mappings** → Provides ticker ↔ instrument ID conversion
152
-
153
- **You only get what you ask for.** This keeps memory usage efficient and prevents unnecessary data loading.
193
+ **Execution Model**:
194
+ - Portfolio data streams in batches (50 users at a time)
195
+ - Each user processed independently
196
+ - Memory-efficient for millions of users
154
197
 
155
- ---
156
-
157
- ## Data Loading & Routing
158
-
159
- ### The Data Loading Pipeline
160
-
161
- ```
162
- ┌─────────────────────────────────────────────────────────────┐
163
- │ DataLoader (Cached) │
164
- ├─────────────────────────────────────────────────────────────┤
165
- │ • loadMappings() → Ticker/Instrument maps │
166
- │ • loadInsights(date) → Daily instrument insights │
167
- │ • loadSocial(date) → Social post insights │
168
- │ • loadPriceShard(ref) → Asset price data │
169
- │ • getPriceShardRefs() → All price shards │
170
- │ • getSpecificPriceShardReferences(ids) → Targeted shards │
171
- └─────────────────────────────────────────────────────────────┘
172
-
173
- ┌─────────────────────────────────────────────────────────────┐
174
- │ ComputationExecutor │
175
- ├─────────────────────────────────────────────────────────────┤
176
- │ • executePerUser() → Streams portfolio data │
177
- │ • executeOncePerDay() → Loads global/meta data │
178
- └─────────────────────────────────────────────────────────────┘
179
-
180
- ┌─────────────────────────────────────────────────────────────┐
181
- │ ContextBuilder │
182
- ├─────────────────────────────────────────────────────────────┤
183
- │ Assembles context based on metadata & dependencies │
184
- └─────────────────────────────────────────────────────────────┘
185
-
186
- Your Calculation.process()
187
- ```
188
-
189
- ### Streaming vs Batch Loading
190
-
191
- #### **Standard Computations: Streaming**
192
- Standard (per-user) computations use **streaming** to process users in chunks:
198
+ #### **Meta (Once-Per-Day) Computations**
193
199
 
194
200
  ```javascript
195
- // System streams portfolio data in batches of 50 users
196
- for await (const userBatch of streamPortfolioData()) {
197
- // Each batch is processed in parallel
198
- for (const [userId, portfolio] of Object.entries(userBatch)) {
199
- const context = buildPerUserContext({ userId, portfolio, ... });
200
- await calculation.process(context);
201
+ class MarketMomentum {
202
+ static getMetadata() {
203
+ return {
204
+ type: 'meta', // Runs once per day globally
205
+ category: 'market-signals',
206
+ rootDataDependencies: ['price', 'insights']
207
+ };
208
+ }
209
+
210
+ async process(context) {
211
+ const { prices, insights, math } = context;
212
+ // Process all tickers
213
+ for (const [instId, data] of Object.entries(prices.history)) {
214
+ this.results[data.ticker] = { momentum: /* ... */ };
215
+ }
201
216
  }
202
217
  }
203
218
  ```
204
219
 
205
- **Why streaming?**
206
- - Portfolio data is sharded across multiple documents
207
- - Loading all users at once would exceed memory limits
208
- - Streaming allows processing millions of users efficiently
220
+ **Execution Model**:
221
+ - Loads all price data (or processes in shard batches)
222
+ - Runs once, produces global results
223
+ - Used for market-wide analytics
209
224
 
210
- #### **Meta Computations: Batch or Shard**
211
- Meta computations have two modes:
225
+ ### Manifest Builder
212
226
 
213
- 1. **Standard Meta** (No price dependency):
214
- ```javascript
215
- const context = buildMetaContext({ insights, social, ... });
216
- await calculation.process(context);
217
- ```
227
+ The Manifest Builder automatically:
218
228
 
219
- 2. **Price-Dependent Meta** (Batched Shard Processing):
220
- ```javascript
221
- // System loads price data in shard batches
222
- for (const shardRef of priceShardRefs) {
223
- const shardData = await loadPriceShard(shardRef);
224
- const context = buildMetaContext({ prices: { history: shardData } });
225
- await calculation.process(context);
226
- // Memory is cleared between shards
227
- }
228
- ```
229
+ 1. **Discovers** all calculation classes in the codebase
230
+ 2. **Analyzes** their dependencies
231
+ 3. **Sorts** them topologically (builds a DAG)
232
+ 4. **Assigns** pass numbers (execution waves)
233
+ 5. **Generates** smart hashes for each calculation
229
234
 
230
- ---
231
-
232
- ## Sharding System
235
+ ```
236
+ Pass 1 (No Dependencies):
237
+ - market-volatility
238
+ - price-momentum
239
+
240
+ Pass 2 (Depends on Pass 1):
241
+ - user-risk-profile (needs market-volatility)
242
+ - sentiment-score (needs price-momentum)
243
+
244
+ Pass 3 (Depends on Pass 2):
245
+ - combined-signal (needs sentiment-score + user-risk-profile)
246
+ ```
233
247
 
234
- ### The Problem: Firestore's 1MB Limit
248
+ **Circular Dependency Detection**: If `A → B → C → A`, the builder throws a fatal error and refuses to generate a manifest.
235
249
 
236
- Firestore has a **1MB hard limit** per document. When computation results contain thousands of tickers (e.g., momentum scores for every asset), the document exceeds this limit.
250
+ ---
237
251
 
238
- ### The Solution: Auto-Sharding
252
+ ## Context & Dependency Injection
239
253
 
240
- The system **automatically detects** when a result is too large and splits it into a subcollection.
254
+ ### Context Structure
241
255
 
242
- ### How Auto-Sharding Works
256
+ Every calculation receives exactly what it declares:
243
257
 
244
258
  ```javascript
245
- // When saving results:
246
- const result = {
247
- "AAPL": { score: 0.8, volatility: 0.25 },
248
- "GOOGL": { score: 0.7, volatility: 0.22 },
249
- // ... 5,000+ tickers
250
- };
251
-
252
- // System calculates size:
253
- const totalSize = calculateFirestoreBytes(result); // ~1.2 MB
254
-
255
- // IF size > 900KB (safety threshold):
256
- // 1. Splits data into chunks < 900KB each
257
- // 2. Writes chunks to: /results/{date}/{category}/{calc}/_shards/shard_0
258
- // /results/{date}/{category}/{calc}/_shards/shard_1
259
- // 3. Writes pointer: /results/{date}/{category}/{calc}
260
- // → { _sharded: true, _shardCount: 2, _completed: true }
261
-
262
- // IF size < 900KB:
263
- // Writes normally: /results/{date}/{category}/{calc}
264
- // { "AAPL": {...}, "GOOGL": {...}, _completed: true, _sharded: false }
259
+ {
260
+ // IDENTITY (Standard only)
261
+ user: {
262
+ id: "user_123",
263
+ type: "speculator",
264
+ portfolio: { today: {...}, yesterday: {...} },
265
+ history: { today: {...}, yesterday: {...} }
266
+ },
267
+
268
+ // TEMPORAL
269
+ date: { today: "2024-12-07" },
270
+
271
+ // ROOT DATA (if declared)
272
+ insights: { today: {...}, yesterday: {...} },
273
+ social: { today: {...}, yesterday: {...} },
274
+ prices: { history: {...} },
275
+
276
+ // MAPPINGS
277
+ mappings: {
278
+ tickerToInstrument: { "AAPL": 123 },
279
+ instrumentToTicker: { 123: "AAPL" }
280
+ },
281
+
282
+ // MATH LAYERS (always injected)
283
+ math: {
284
+ extract: DataExtractor,
285
+ compute: MathPrimitives,
286
+ signals: SignalPrimitives,
287
+ history: HistoryExtractor,
288
+ insights: InsightsExtractor,
289
+ priceExtractor: priceExtractor,
290
+ // ... 20+ utility classes
291
+ },
292
+
293
+ // DEPENDENCIES (if declared)
294
+ computed: {
295
+ "market-volatility": { "AAPL": { volatility: 0.25 } }
296
+ },
297
+
298
+ // HISTORICAL DEPENDENCIES (if isHistorical: true)
299
+ previousComputed: {
300
+ "market-volatility": { "AAPL": { volatility: 0.23 } }
301
+ }
302
+ }
265
303
  ```
266
304
 
267
- ### Reading Sharded Data
305
+ ### Lazy Loading Optimization
268
306
 
269
- The system **transparently reassembles** sharded data when loading dependencies:
307
+ The system **only loads what you declare**:
270
308
 
271
309
  ```javascript
272
- // When loading a dependency:
273
- const result = await fetchExistingResults(dateStr, ['momentum-score']);
274
-
275
- // System checks: Is this document sharded?
276
- if (doc.data()._sharded === true) {
277
- // 1. Fetch all docs from _shards subcollection
278
- // 2. Merge them back into a single object
279
- // 3. Return as if it was never sharded
280
- }
310
+ // Calculation A declares:
311
+ rootDataDependencies: ['portfolio']
281
312
 
282
- // Your calculation receives complete data, regardless of storage method
283
- ```
284
-
285
- ### Handling Mixed Storage Scenarios
313
+ // Context A receives:
314
+ { user: { portfolio: {...} } } // No insights, social, or prices loaded
286
315
 
287
- **Question:** What if I need data from 2 days, where Day 1 is sharded and Day 2 is not?
288
316
 
289
- **Answer:** The system handles this automatically:
317
+ // Calculation B declares:
318
+ rootDataDependencies: ['portfolio', 'insights']
290
319
 
291
- ```javascript
292
- // Your calculation declares:
293
- static getMetadata() {
294
- return {
295
- isHistorical: true, // I need yesterday's data
296
- // ...
297
- };
320
+ // Context B receives:
321
+ {
322
+ user: { portfolio: {...} },
323
+ insights: { today: {...} } // Insights fetched on-demand
298
324
  }
325
+ ```
326
+
327
+ This prevents unnecessary Firestore reads and keeps memory usage minimal.
299
328
 
300
- // System loads BOTH days:
301
- const computed = await fetchExistingResults(todayDate, ['momentum-score']);
302
- // → Auto-detects if sharded, reassembles if needed
329
+ ---
303
330
 
304
- const previousComputed = await fetchExistingResults(yesterdayDate, ['momentum-score']);
305
- // → Auto-detects if sharded, reassembles if needed
331
+ ## Execution Pipeline
332
+
333
+ ### Phase 1: Analysis & Dispatch
334
+
335
+ ```
336
+ ┌─────────────────────────────────────────────────────┐
337
+ │ Computation Dispatcher │
338
+ │ (Smart Pre-Flight Checker) │
339
+ ├─────────────────────────────────────────────────────┤
340
+ │ For each date in range: │
341
+ │ 1. Fetch Root Data Index │
342
+ │ 2. Fetch Computation Status (stored hashes) │
343
+ │ 3. Fetch Yesterday's Status (historical check) │
344
+ │ 4. Run analyzeDateExecution() │
345
+ │ │
346
+ │ Decision Logic per Calculation: │
347
+ │ ├─ Root Data Missing? │
348
+ │ │ ├─ Historical Date → Mark IMPOSSIBLE │
349
+ │ │ └─ Today's Date → Mark BLOCKED (retriable) │
350
+ │ │ │
351
+ │ ├─ Dependency Impossible? │
352
+ │ │ └─ Mark IMPOSSIBLE (cascading failure) │
353
+ │ │ │
354
+ │ ├─ Dependency Missing/Hash Mismatch? │
355
+ │ │ └─ Mark BLOCKED (wait for dependency) │
356
+ │ │ │
357
+ │ ├─ Historical Continuity Broken? │
358
+ │ │ └─ Mark BLOCKED (wait for yesterday) │
359
+ │ │ │
360
+ │ ├─ Hash Mismatch? │
361
+ │ │ └─ Mark RUNNABLE (re-run needed) │
362
+ │ │ │
363
+ │ └─ Hash Match? │
364
+ │ └─ Mark SKIPPED (up-to-date) │
365
+ │ │
366
+ │ 5. Create Audit Ledger (PENDING state) │
367
+ │ 6. Publish RUNNABLE tasks to Pub/Sub │
368
+ └─────────────────────────────────────────────────────┘
369
+ ```
370
+
371
+ ### Phase 2: Worker Execution
372
+
373
+ ```
374
+ ┌─────────────────────────────────────────────────────┐
375
+ │ Computation Worker │
376
+ │ (Processes Single Task) │
377
+ ├─────────────────────────────────────────────────────┤
378
+ │ 1. Parse Pub/Sub Message │
379
+ │ { date, pass, computation, previousCategory } │
380
+ │ │
381
+ │ 2. Load Manifest (cached in memory) │
382
+ │ │
383
+ │ 3. Fetch Dependencies │
384
+ │ - Load dependency results (auto-reassemble) │
385
+ │ - Load previous day's results (if historical) │
386
+ │ │
387
+ │ 4. Execute Calculation │
388
+ │ ├─ Standard: Stream users in batches │
389
+ │ └─ Meta: Load global data / price shards │
390
+ │ │
391
+ │ 5. Validate Results (HeuristicValidator) │
392
+ │ - NaN Detection │
393
+ │ - Flatline Detection (stuck values) │
394
+ │ - Null/Empty Analysis │
395
+ │ - Dead Object Detection │
396
+ │ │
397
+ │ 6. Store Results │
398
+ │ ├─ Calculate size │
399
+ │ ├─ If > 900KB → Auto-shard │
400
+ │ └─ Write to Firestore │
401
+ │ │
402
+ │ 7. Update Status & Ledgers │
403
+ │ ├─ computation_status/{date} → New hash │
404
+ │ ├─ Audit Ledger → COMPLETED │
405
+ │ └─ Run History → SUCCESS/FAILURE record │
406
+ │ │
407
+ │ 8. Category Migration (if detected) │
408
+ │ └─ Delete old category's data │
409
+ └─────────────────────────────────────────────────────┘
410
+ ```
411
+
412
+ ### Error Handling Stages
413
+
414
+ The system tracks **where** failures occur:
306
415
 
307
- // Context now has both:
416
+ ```javascript
417
+ // Run History Schema
308
418
  {
309
- computed: { "momentum-score": { /* today's data, reassembled if sharded */ } },
310
- previousComputed: { "momentum-score": { /* yesterday's data, reassembled if sharded */ } }
419
+ status: "FAILURE" | "SUCCESS" | "CRASH",
420
+ error: {
421
+ message: "...",
422
+ stage: "EXECUTION" | "PREPARE_SHARDS" | "COMMIT_BATCH" |
423
+ "SHARDING_LIMIT_EXCEEDED" | "QUALITY_CIRCUIT_BREAKER" |
424
+ "MANIFEST_LOAD" | "SYSTEM_CRASH"
425
+ }
311
426
  }
312
427
  ```
313
428
 
314
- You never need to know or care whether data is sharded. The system guarantees you receive complete, reassembled data.
429
+ **Stage-Specific Handling**:
430
+ - `QUALITY_CIRCUIT_BREAKER`: Block deployment, data integrity issue
431
+ - `SHARDING_LIMIT_EXCEEDED`: Firestore hard limit hit, needs redesign
432
+ - `SYSTEM_CRASH`: Infrastructure issue, retriable
433
+ - `EXECUTION`: Logic bug in calculation code
315
434
 
316
435
  ---
317
436
 
318
- ## Computation Types & Execution
319
-
320
- ### Standard Computations (`type: 'standard'`)
321
-
322
- **Purpose:** Per-user calculations (risk profiles, P&L analysis, behavioral scoring)
437
+ ## Smart Hashing & Versioning
323
438
 
324
- **Execution:**
325
- - Runs **once per user** per day
326
- - Receives individual user portfolio and history
327
- - Streams data in batches for memory efficiency
439
+ ### Hash Composition
328
440
 
329
- **Example:**
330
441
  ```javascript
331
- class UserRiskProfile {
332
- static getMetadata() {
333
- return {
334
- type: 'standard',
335
- rootDataDependencies: ['portfolio', 'history'],
336
- userType: 'speculator'
337
- };
338
- }
339
-
340
- async process(context) {
341
- const { user, math } = context;
342
- const portfolio = user.portfolio.today;
343
- const positions = math.extract.getPositions(portfolio, user.type);
344
-
345
- // Calculate risk per user
346
- this.results[user.id] = { riskScore: /* ... */ };
442
+ // Step 1: Intrinsic Hash (Code + System Epoch)
443
+ const codeHash = SHA256(calculation.toString());
444
+ const intrinsicHash = SHA256(codeHash + "|EPOCH:v1.0-epoch-2");
445
+
446
+ // Step 2: Layer Hashing (Dynamic Detection)
447
+ let compositeHash = intrinsicHash;
448
+ for (const [layer, exports] of MATH_LAYERS) {
449
+ for (const [exportName, triggerPatterns] of exports) {
450
+ if (codeString.includes(exportName)) {
451
+ compositeHash += layerHashes[layer][exportName];
452
+ }
347
453
  }
348
454
  }
455
+
456
+ // Step 3: Dependency Hashing (Merkle Tree)
457
+ const depHashes = dependencies.map(dep => dep.hash).join('|');
458
+ const finalHash = SHA256(compositeHash + "|DEPS:" + depHashes);
349
459
  ```
350
460
 
351
- **Result Structure:**
461
+ ### Cascading Invalidation Example
462
+
463
+ ```
464
+ Initial State:
465
+ PriceVolatility → hash: abc123
466
+ UserRisk (uses PV) → hash: def456 (includes abc123)
467
+ Signal (uses UR) → hash: ghi789 (includes def456)
468
+
469
+ Developer Updates PriceVolatility:
470
+ PriceVolatility → hash: xyz000 (NEW!)
471
+ UserRisk → hash: uvw111 (NEW! Dependency changed)
472
+ Signal → hash: rst222 (NEW! Cascade)
473
+
474
+ Next Dispatch:
475
+ All 3 calculations marked RUNNABLE (hash mismatch)
476
+ ```
477
+
478
+ ### System Epoch
479
+
480
+ `system_epoch.js`:
352
481
  ```javascript
353
- {
354
- "user_123": { riskScore: 0.75 },
355
- "user_456": { riskScore: 0.45 },
356
- // ... millions of users
357
- }
482
+ module.exports = "v1.0-epoch-2";
358
483
  ```
359
484
 
360
- ### Meta Computations (`type: 'meta'`)
485
+ **Purpose**: Changing this string forces **global re-computation** of all calculations, even if code hasn't changed. Used for:
486
+ - Schema migrations
487
+ - Critical bug fixes requiring historical reprocessing
488
+ - Firestore structure changes
489
+
490
+ ---
361
491
 
362
- **Purpose:** Platform-wide calculations (aggregate metrics, market analysis, global trends)
492
+ ## Data Loading & Streaming
363
493
 
364
- **Execution:**
365
- - Runs **once per day** (not per user)
366
- - Processes all data holistically
367
- - Can access price history for all instruments
494
+ ### Streaming Architecture (Standard Computations)
368
495
 
369
- **Example:**
370
496
  ```javascript
371
- class MarketMomentum {
372
- static getMetadata() {
373
- return {
374
- type: 'meta',
375
- rootDataDependencies: ['price', 'insights']
376
- };
377
- }
497
+ // Problem: 10M users × 5KB portfolio = 50GB
498
+ // Solution: Stream in chunks
499
+
500
+ async function* streamPortfolioData(dateStr, refs) {
501
+ const BATCH_SIZE = 50; // 50 users at a time
378
502
 
379
- async process(context) {
380
- const { prices, insights, math } = context;
503
+ for (let i = 0; i < refs.length; i += BATCH_SIZE) {
504
+ const batchRefs = refs.slice(i, i + BATCH_SIZE);
505
+ const userData = await loadDataByRefs(batchRefs);
381
506
 
382
- // Calculate momentum for every ticker
383
- for (const [instId, data] of Object.entries(prices.history)) {
384
- const ticker = data.ticker;
385
- const priceData = math.priceExtractor.getHistory(prices, ticker);
386
-
387
- this.results[ticker] = { momentum: /* ... */ };
388
- }
507
+ yield userData; // { user1: {...}, user2: {...}, ... }
508
+
509
+ // Memory cleared after each iteration
389
510
  }
390
511
  }
391
- ```
392
512
 
393
- **Result Structure:**
394
- ```javascript
395
- {
396
- "AAPL": { momentum: 0.65 },
397
- "GOOGL": { momentum: 0.82 },
398
- // ... all tickers
513
+ // Usage in Executor
514
+ for await (const userBatch of streamPortfolioData(date, refs)) {
515
+ for (const [userId, portfolio] of Object.entries(userBatch)) {
516
+ const context = buildContext({ userId, portfolio, ... });
517
+ await calculation.process(context);
518
+ }
399
519
  }
400
520
  ```
401
521
 
402
- ### Price-Dependent Meta Computations
403
-
404
- When a meta computation declares `rootDataDependencies: ['price']`, it enters **batched shard processing mode**:
522
+ ### Price Data Batching (Meta Computations)
405
523
 
406
524
  ```javascript
407
- // Instead of loading ALL price data at once (would crash):
525
+ // Problem: 10,000 tickers × 2 years history = 1GB
526
+ // Solution: Process shards sequentially
527
+
408
528
  for (const shardRef of priceShardRefs) {
409
- const shardData = await loadPriceShard(shardRef); // ~50-100 instruments per shard
529
+ const shardData = await loadPriceShard(shardRef); // ~100 tickers
410
530
 
411
531
  const context = buildMetaContext({
412
- prices: { history: shardData } // Only this shard's data
532
+ prices: { history: shardData }
413
533
  });
414
534
 
415
535
  await calculation.process(context);
416
536
 
417
537
  // Results accumulate across shards
418
- // Memory is cleared between iterations
538
+ // Memory cleared between shards
419
539
  }
420
540
  ```
421
541
 
422
- **Your calculation receives partial data** and processes it incrementally. The system ensures all shards are eventually processed.
423
-
424
- ---
425
-
426
- ## Dependency Management
542
+ ### Smart Shard Indexing (Optimization)
427
543
 
428
- ### Declaring Dependencies
544
+ For targeted price lookups (e.g., "only calculate momentum for AAPL, GOOGL, MSFT"):
429
545
 
430
546
  ```javascript
431
- static getDependencies() {
432
- return ['risk-metrics', 'sentiment-score', 'momentum-analysis'];
433
- }
547
+ // Without Indexing: Load ALL shards, filter after
548
+ // Cost: 100+ Firestore reads
549
+
550
+ // With Indexing: Pre-map which shard contains each instrument
551
+ const index = {
552
+ "123": "shard_0", // AAPL
553
+ "456": "shard_2", // GOOGL
554
+ "789": "shard_0" // MSFT
555
+ };
556
+
557
+ const relevantShards = ["shard_0", "shard_2"];
558
+ // Cost: 2 Firestore reads
434
559
  ```
435
560
 
436
- This tells the system: "Before you run me, make sure these 3 calculations have completed."
561
+ **Index Building**: Runs once, cached in `/system_metadata/price_shard_index`.
437
562
 
438
- ### How Dependencies Are Loaded
563
+ ---
439
564
 
440
- When your calculation runs:
565
+ ## Auto-Sharding System
441
566
 
442
- 1. System fetches results from all declared dependencies
443
- 2. Checks if data is sharded → reassembles if needed
444
- 3. Injects into `context.computed`:
567
+ ### The 1MB Problem
445
568
 
446
- ```javascript
447
- {
448
- computed: {
449
- "risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
450
- "sentiment-score": { "AAPL": { score: 0.8 }, ... },
451
- "momentum-analysis": { "AAPL": { momentum: 0.65 }, ... }
452
- }
453
- }
454
- ```
569
+ Firestore's hard limit: **1MB per document**. A calculation producing results for 5,000 tickers easily exceeds this.
455
570
 
456
- ### Accessing Dependency Results
571
+ ### Transparent Sharding Solution
457
572
 
458
573
  ```javascript
459
- async process(context) {
460
- const { computed, math } = context;
574
+ // ResultCommitter.js - prepareAutoShardedWrites()
575
+
576
+ const totalSize = calculateFirestoreBytes(result);
577
+
578
+ if (totalSize < 900KB) {
579
+ // Write normally
580
+ /results/{date}/{category}/{calc}
581
+ → { "AAPL": {...}, "GOOGL": {...}, _completed: true }
461
582
 
462
- // Access results from dependencies
463
- const volatility = math.signals.getMetric(
464
- computed,
465
- 'risk-metrics',
466
- 'AAPL',
467
- 'volatility'
468
- );
583
+ } else {
584
+ // Auto-shard
585
+ // Step 1: Split into chunks < 900KB each
586
+ const chunks = splitIntoChunks(result);
469
587
 
470
- const sentiment = math.signals.getMetric(
471
- computed,
472
- 'sentiment-score',
473
- 'AAPL',
474
- 'score'
475
- );
588
+ // Step 2: Write shards
589
+ /results/{date}/{category}/{calc}/_shards/shard_0 → chunk 1
590
+ /results/{date}/{category}/{calc}/_shards/shard_1 → chunk 2
591
+ /results/{date}/{category}/{calc}/_shards/shard_N → chunk N
476
592
 
477
- // Use them in your calculation
478
- const combinedScore = volatility * sentiment;
593
+ // Step 3: Write pointer
594
+ /results/{date}/{category}/{calc}
595
+ → { _sharded: true, _shardCount: N, _completed: true }
479
596
  }
480
597
  ```
481
598
 
482
- ### Historical Dependencies
483
-
484
- If your calculation needs **yesterday's dependency results**:
599
+ ### Transparent Reassembly
485
600
 
486
601
  ```javascript
487
- static getMetadata() {
488
- return {
489
- isHistorical: true, // ← Enable historical mode
490
- // ...
491
- };
492
- }
602
+ // DependencyFetcher.js - fetchExistingResults()
493
603
 
494
- async process(context) {
495
- const { computed, previousComputed } = context;
496
-
497
- // Today's risk
498
- const todayRisk = computed['risk-metrics']['AAPL'].volatility;
604
+ const doc = await docRef.get();
605
+ const data = doc.data();
606
+
607
+ if (data._sharded === true) {
608
+ // 1. Fetch all shards
609
+ const shardsCol = docRef.collection('_shards');
610
+ const snapshot = await shardsCol.get();
499
611
 
500
- // Yesterday's risk
501
- const yesterdayRisk = previousComputed['risk-metrics']['AAPL'].volatility;
612
+ // 2. Merge back into single object
613
+ const assembled = {};
614
+ snapshot.forEach(shard => {
615
+ Object.assign(assembled, shard.data());
616
+ });
502
617
 
503
- // Calculate change
504
- const riskChange = todayRisk - yesterdayRisk;
618
+ // 3. Return as if never sharded
619
+ return assembled;
505
620
  }
621
+
622
+ // Normal path: return as-is
623
+ return data;
506
624
  ```
507
625
 
508
- ---
626
+ **Developer Experience**: You **never** know or care whether data is sharded. Read and write as if documents have no size limit.
627
+
628
+ ### Sharding Limits
509
629
 
510
- ## Versioning & Smart Hashing
630
+ **Maximum Calculation Size**: ~450MB (500 shards × 900KB)
631
+
632
+ If a calculation exceeds this, the system throws:
633
+ ```
634
+ error: {
635
+ stage: "SHARDING_LIMIT_EXCEEDED",
636
+ message: "Firestore subcollection limit reached"
637
+ }
638
+ ```
511
639
 
512
- ### The Problem: When to Recompute?
640
+ **Solution**: Refactor calculation to produce less data or split into multiple calculations.
513
641
 
514
- If you fix a bug in a calculation, how does the system know to re-run it for all past dates?
642
+ ---
515
643
 
516
- ### The Solution: Merkle Tree Dependency Hashing
644
+ ## Quality Assurance
517
645
 
518
- Every calculation gets a **smart hash** that includes:
646
+ ### HeuristicValidator (Grey Box Testing)
519
647
 
520
- 1. **Its own code** (SHA-256 of the class definition)
521
- 2. **Layer dependencies** (Hashes of math layers it uses)
522
- 3. **Calculation dependencies** (Hashes of calculations it depends on)
648
+ Runs statistical analysis on results **before storage**:
523
649
 
524
650
  ```javascript
525
- // Example hash composition:
526
- const intrinsicHash = hash(calculation.toString() + layerHashes);
527
- const dependencyHashes = dependencies.map(dep => dep.hash).join('|');
528
- const finalHash = hash(intrinsicHash + '|DEPS:' + dependencyHashes);
651
+ // ResultsValidator.js
529
652
 
530
- // Result: "a3f9c2e1..." (SHA-256)
531
- ```
653
+ 1. NaN Detection
654
+ - Scans sample of results for NaN/Infinity
655
+ - Threshold: 0% (strict, NaN is always a bug)
532
656
 
533
- ### Cascading Invalidation
657
+ 2. Flatline Detection
658
+ - Checks if >95% of values are identical
659
+ - Catches stuck loops or broken RNG
534
660
 
535
- If **Calculation A** changes, **Calculation B** (which depends on A) automatically gets a new hash:
661
+ 3. Null/Empty Analysis
662
+ - Threshold: 90% of results are null/0
663
+ - Indicates data pipeline failure
536
664
 
665
+ 4. Dead Object Detection
666
+ - Finds objects where all properties are null/0
667
+ - Example: { profile: [], score: 0, signal: null }
668
+
669
+ 5. Vector Emptiness (Distribution Calcs)
670
+ - Checks if histogram/profile arrays are empty
671
+ - Threshold: 90% empty → FAIL
537
672
  ```
538
- Risk Metrics (v1) → hash: abc123
539
-
540
- Sentiment Score → hash: def456 (includes abc123)
541
- (depends on Risk)
542
- ```
543
673
 
544
- If you update Risk Metrics:
674
+ **Circuit Breaker**: If validation fails, the calculation **does not store results** and is marked as `FAILURE` with stage: `QUALITY_CIRCUIT_BREAKER`.
675
+
676
+ ### Validation Overrides
677
+
678
+ For legitimately sparse datasets:
545
679
 
680
+ ```javascript
681
+ // validation_overrides.js
682
+ module.exports = {
683
+ "bankruptcy-detector": {
684
+ maxZeroPct: 100 // Rare event, 100% zeros is expected
685
+ },
686
+ "earnings-surprise": {
687
+ maxNullPct: 99 // Only runs on earnings days
688
+ }
689
+ };
546
690
  ```
547
- Risk Metrics (v2) → hash: xyz789 (NEW!)
548
-
549
- Sentiment Score → hash: ghi012 (NEW! Because dependency changed)
691
+
692
+ ### Build Reporter (Pre-Deployment Analysis)
693
+
694
+ ```bash
695
+ npm run build-reporter
550
696
  ```
551
697
 
552
- ### Recomputation Logic
698
+ Generates a **simulation report** without running calculations:
553
699
 
554
- For each date, the system checks:
700
+ ```
701
+ Build Report: v1.2.5_2024-12-07
702
+ ================================
555
703
 
556
- ```javascript
557
- // Stored in Firestore:
558
- computationStatus['2024-12-07'] = {
559
- 'risk-metrics': 'abc123', // Last run hash
560
- 'sentiment-score': 'def456'
561
- };
704
+ Summary:
705
+ - 1,245 Re-Runs (hash mismatch)
706
+ - 23 New Calculations
707
+ - 0 Impossible
708
+ - 45 Blocked (waiting for data)
562
709
 
563
- // Current manifest:
564
- manifest['risk-metrics'].hash = 'xyz789'; // NEW HASH!
565
- manifest['sentiment-score'].hash = 'ghi012';
710
+ Detailed Breakdown:
711
+
712
+ 2024-12-01:
713
+ Will Re-Run:
714
+ - user-risk-profile (Hash: abc123 → xyz789)
715
+ - sentiment-score (Hash: def456 → uvw012)
716
+
717
+ Blocked:
718
+ - social-sentiment (Missing Root Data: social)
566
719
 
567
- // Decision:
568
- // - Risk Metrics: Hash mismatch → RERUN
569
- // - Sentiment Score: Hash mismatch → RERUN (cascaded)
720
+ 2024-12-02:
721
+ Will Run:
722
+ - new-momentum-signal (New calculation)
570
723
  ```
571
724
 
572
- This ensures **incremental recomputation**: only changed calculations (and their dependents) re-run.
725
+ **Use Case**: Review before deploying to production. If 10,000 re-runs are detected, investigate whether code change was intentional.
573
726
 
574
727
  ---
575
728
 
576
- ## Execution Modes
729
+ ## Operational Modes
577
730
 
578
- ### Mode 1: Legacy (Orchestrator)
579
-
580
- **Single-process execution** for all dates and calculations.
731
+ ### Mode 1: Local Orchestrator (Development)
581
732
 
582
733
  ```bash
734
+ # Run all calculations for Pass 1 sequentially
583
735
  COMPUTATION_PASS_TO_RUN=1 npm run computation-orchestrator
584
736
  ```
585
737
 
738
+ **Behavior**:
739
+ - Single-process execution
586
740
  - Loads manifest
587
741
  - Iterates through all dates
588
- - Runs all calculations in Pass 1 sequentially
589
- - Good for: Development, debugging
742
+ - Runs calculations in order
743
+ - Good for: Debugging, local testing
590
744
 
591
745
  ### Mode 2: Dispatcher + Workers (Production)
592
746
 
593
- **Distributed execution** using Pub/Sub.
594
-
595
- #### Step 1: Dispatch Tasks
596
747
  ```bash
748
+ # Step 1: Dispatch tasks to Pub/Sub
597
749
  COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher
598
- ```
599
750
 
600
- Publishes messages to Pub/Sub:
601
- ```json
602
- {
603
- "action": "RUN_COMPUTATION_DATE",
604
- "date": "2024-12-07",
605
- "pass": "1"
606
- }
751
+ # Step 2: Cloud Function workers consume tasks
752
+ # (Auto-scaled by GCP, 0 to 1000+ workers)
607
753
  ```
608
754
 
609
- #### Step 2: Workers Consume Tasks
755
+ **Behavior**:
756
+ - Dispatcher analyzes all dates
757
+ - Publishes ~10,000 messages to Pub/Sub
758
+ - Workers process in parallel
759
+ - Each worker handles 1 date
760
+ - Auto-retries on failure (Pub/Sub built-in)
761
+
762
+ **Scaling**: 1,000 dates × 3 calcs = 3,000 tasks. With 100 workers, completes in ~5 minutes.
763
+
764
+ ### Mode 3: Batch Price Executor (Optimization)
765
+
610
766
  ```bash
611
- # Cloud Function triggered by Pub/Sub
612
- # Or: Local consumer for testing
613
- npm run computation-worker
767
+ # For price-dependent calcs, bulk-process historical data
768
+ npm run batch-price-executor --dates=2024-12-01,2024-12-02 --calcs=momentum-signal
614
769
  ```
615
770
 
616
- Each worker:
617
- 1. Receives a date + pass
618
- 2. Loads manifest
619
- 3. Runs calculations for that date only
620
- 4. Updates status document
771
+ **Behavior**:
772
+ - Loads price shards once
773
+ - Processes multiple dates in a single pass
774
+ - Bypasses Pub/Sub overhead
775
+ - **10x faster** for historical backfills
621
776
 
622
- **Benefits:**
623
- - Parallel execution (100+ workers)
624
- - Fault tolerance (failed dates retry automatically)
625
- - Scales to millions of dates
777
+ **Use Case**: After deploying a new price-dependent calculation, backfill 2 years of history in 1 hour instead of 10.
626
778
 
627
- ### Pass System
779
+ ---
628
780
 
629
- Calculations are grouped into **passes** based on dependencies:
781
+ ## Advanced Topics
630
782
 
631
- ```
632
- Pass 1: Base calculations (no dependencies)
633
- - risk-metrics
634
- - price-momentum
783
+ ### Historical Continuity Enforcement
635
784
 
636
- Pass 2: Depends on Pass 1
637
- - sentiment-score (needs risk-metrics)
638
- - trend-analysis (needs price-momentum)
785
+ For calculations that depend on their own previous results:
639
786
 
640
- Pass 3: Depends on Pass 2
641
- - combined-signal (needs sentiment-score + trend-analysis)
787
+ ```javascript
788
+ // Example: cumulative-pnl needs yesterday's cumulative-pnl
789
+
790
+ static getMetadata() {
791
+ return { isHistorical: true };
792
+ }
793
+
794
+ // Dispatcher Logic:
795
+ if (calculation.isHistorical) {
796
+ const yesterday = date - 1 day;
797
+ const yesterdayStatus = await fetchComputationStatus(yesterday);
798
+
799
+ if (!yesterdayStatus[calcName] ||
800
+ yesterdayStatus[calcName].hash !== currentHash) {
801
+ // Yesterday is missing or has wrong hash
802
+ report.blocked.push({
803
+ reason: "Waiting for historical continuity"
804
+ });
805
+ }
806
+ }
642
807
  ```
643
808
 
644
- **You run passes sequentially:**
645
- ```bash
646
- COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher # Wait for completion
647
- COMPUTATION_PASS_TO_RUN=2 npm run computation-dispatcher # Wait for completion
648
- COMPUTATION_PASS_TO_RUN=3 npm run computation-dispatcher
809
+ **Result**: Historical calculations run in **strict chronological order**, never skipping days.
810
+
811
+ ### Category Migration System
812
+
813
+ If a calculation's category changes:
814
+
815
+ ```javascript
816
+ // Before: category: 'signals'
817
+ // After: category: 'risk-management'
818
+
819
+ // System detects change:
820
+ manifest.previousCategory = 'signals';
821
+
822
+ // Worker executes:
823
+ 1. Runs calculation normally
824
+ 2. Stores in new category: /results/{date}/risk-management/{calc}
825
+ 3. Deletes old category: /results/{date}/signals/{calc}
649
826
  ```
650
827
 
651
- The manifest builder automatically assigns pass numbers via topological sort.
828
+ **Automation**: Zero manual data migration needed.
829
+
830
+ ### Audit Ledger vs Run History
831
+
832
+ **Audit Ledger** (`computation_audit_ledger/{date}/passes/{pass}/tasks/{calc}`):
833
+ - Created **before** dispatch
834
+ - Status: PENDING → COMPLETED
835
+ - Purpose: Track which tasks were dispatched
836
+
837
+ **Run History** (`computation_run_history/{date}/runs/{runId}`):
838
+ - Created **after** execution attempt
839
+ - Status: SUCCESS | FAILURE | CRASH
840
+ - Purpose: Debug failures, track performance
841
+
842
+ **Why Both?**: Audit Ledger answers "What should run?", Run History answers "What actually happened?".
652
843
 
653
844
  ---
654
845
 
@@ -657,60 +848,78 @@ The manifest builder automatically assigns pass numbers via topological sort.
657
848
  ### For a Standard Calculation
658
849
 
659
850
  ```
660
- 1. Manifest Builder
661
- ├─ Scans your calculation class
662
- ├─ Generates smart hash (code + layers + dependencies)
663
- ├─ Assigns to a pass based on dependency graph
664
- └─ Validates all dependencies exist
851
+ 1. Root Data Indexer (Daily)
852
+ └─ Scans all data sources
853
+ └─ Creates availability manifest
665
854
 
666
- 2. Dispatcher/Orchestrator
855
+ 2. Dispatcher (Per-Pass)
667
856
  ├─ Loads manifest
668
- ├─ Iterates through all dates
669
- └─ For each date:
670
- ├─ Checks if calculation needs to run (hash mismatch?)
671
- ├─ Checks if root data exists (portfolio, history, etc.)
672
- └─ Dispatches task (or runs directly)
673
-
674
- 3. Worker/Executor
675
- ├─ Receives task for specific date
676
- ├─ Loads dependency results (auto-reassembles if sharded)
857
+ ├─ For each date:
858
+ │ ├─ Checks root data availability
859
+ ├─ Checks dependency status
860
+ ├─ Checks historical continuity
861
+ └─ Decides: RUNNABLE | BLOCKED | IMPOSSIBLE
862
+ ├─ Creates Audit Ledger (PENDING)
863
+ └─ Publishes RUNNABLE tasks to Pub/Sub
864
+
865
+ 3. Worker (Per-Task)
866
+ ├─ Receives {date, pass, computation}
867
+ ├─ Loads manifest (cached)
868
+ ├─ Fetches dependencies (auto-reassembles shards)
677
869
  ├─ Streams portfolio data in batches
678
- └─ For each user batch:
679
- ├─ Builds per-user context
680
- ├─ Injects math layers, mappings, computed dependencies
681
- ├─ Calls your calculation.process(context)
682
- └─ Accumulates results
683
-
684
- 4. Result Committer
685
- ├─ Calculates total result size
686
- ├─ IF size > 900KB:
687
- │ ├─ Splits into chunks
688
- │ ├─ Writes to _shards subcollection
689
- └─ Writes pointer document
690
- └─ ELSE:
691
- └─ Writes single document
692
-
693
- 5. Status Updater
694
- └─ Updates computation_status/{date} with new hash
870
+ ├─ For each user:
871
+ ├─ Builds context (dependency injection)
872
+ │ └─ Calls calculation.process(context)
873
+ ├─ Validates results (HeuristicValidator)
874
+ ├─ Auto-shards if > 900KB
875
+ ├─ Commits to Firestore
876
+ ├─ Updates status hash
877
+ ├─ Updates Audit Ledger → COMPLETED
878
+ └─ Records Run History → SUCCESS
879
+
880
+ 4. Next Pass
881
+ └─ Depends on results from this pass
695
882
  ```
696
883
 
697
884
  ### For a Meta Calculation
698
885
 
699
- Same as above, except:
700
-
701
- - **Step 3**: Loads all data once (or iterates through price shards)
702
- - **Context**: Global data, not per-user
703
- - **Result**: One document per date (e.g., all tickers' momentum scores)
886
+ Same flow, except:
887
+ - **Step 3**: Loads global data instead of streaming users
888
+ - **Context**: No user object, prices/insights instead
889
+ - **Result**: One document with all tickers' data
704
890
 
705
891
  ---
706
892
 
707
893
  ## Key Takeaways
708
894
 
709
- 1. **Context is Auto-Built**: Declare what you need in metadata; the system handles the rest
710
- 2. **Sharding is Transparent**: Read and write as if documents have no size limit
711
- 3. **Dependencies Just Work**: Results are automatically fetched and reassembled
712
- 4. **Versioning is Smart**: Change code system knows what to rerun
713
- 5. **Streaming is Automatic**: Standard computations stream data; you don't manage batches
714
- 6. **Execution is Flexible**: Run locally for dev, distributed for production
895
+ 1. **Data Availability Gates Everything**: Computations never run when source data is missing
896
+ 2. **Smart Hashing Enables Incremental Updates**: Only changed calculations re-run
897
+ 3. **Sharding is Invisible**: Read/write as if documents have no size limit
898
+ 4. **Streaming Handles Scale**: Process millions of users without OOM
899
+ 5. **Quality Checks Prevent Bad Data**: Results validated before storage
900
+ 6. **Historical Continuity is Enforced**: Time-series calculations run in order
901
+ 7. **Distributed Execution Scales Infinitely**: 1 worker or 1,000 workers, same code
715
902
 
716
903
  ---
904
+
905
+ ## Operational Checklist
906
+
907
+ **Daily (Automated)**:
908
+ - ✅ Root Data Indexer runs at 2 AM UTC
909
+ - ✅ Computation Dispatchers run for each pass (3 AM, 4 AM, 5 AM)
910
+ - ✅ Workers auto-scale based on Pub/Sub queue depth
911
+
912
+ **After Code Changes**:
913
+ 1. Run Build Reporter to preview impact
914
+ 2. Review re-run count (expected vs actual)
915
+ 3. Deploy to staging, run single date
916
+ 4. Validate results in Firestore
917
+ 5. Deploy to production
918
+ 6. Monitor Run History for failures
919
+
920
+ **Debugging a Failure**:
921
+ 1. Check Run History for error stage
922
+ 2. If `QUALITY_CIRCUIT_BREAKER`: Data integrity issue, review validator logs
923
+ 3. If `EXECUTION`: Logic bug, reproduce locally with Orchestrator mode
924
+ 4. If `SYSTEM_CRASH`: Infrastructure issue, check Cloud Function logs
925
+ 5. Fix bug, redeploy, re-trigger specific pass