npm - bulltrackers-module - Versions diffs - 1.0.259 → 1.0.260 - Mend

bulltrackers-module 1.0.259 → 1.0.260

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/functions/computation-system/helpers/computation_dispatcher.js +82 -44
package/functions/computation-system/helpers/computation_worker.js +35 -39
package/functions/computation-system/onboarding.md +712 -503
package/functions/computation-system/persistence/ResultCommitter.js +127 -74
package/functions/computation-system/tools/BuildReporter.js +28 -79
package/functions/computation-system/utils/schema_capture.js +31 -2
package/index.js +2 -4
package/package.json +1 -1

package/functions/computation-system/onboarding.md CHANGED Viewed

@@ -1,654 +1,845 @@
-# BullTrackers Computation System - Comprehensive Onboarding Guide
+# BullTrackers Computation System & Root Data Indexer
 ## Table of Contents
-1. [System Overview](#system-overview)
-2. [Context Architecture](#context-architecture)
-3. [Data Loading & Routing](#data-loading--routing)
-4. [Sharding System](#sharding-system)
-5. [Computation Types & Execution](#computation-types--execution)
-6. [Dependency Management](#dependency-management)
-7. [Versioning & Smart Hashing](#versioning--smart-hashing)
-8. [Execution Modes](#execution-modes)
+1. [System Philosophy](#system-philosophy)
+2. [Root Data Indexer](#root-data-indexer)
+3. [Computation Architecture](#computation-architecture)
+4. [Context & Dependency Injection](#context--dependency-injection)
+5. [Execution Pipeline](#execution-pipeline)
+6. [Smart Hashing & Versioning](#smart-hashing--versioning)
+7. [Data Loading & Streaming](#data-loading--streaming)
+8. [Auto-Sharding System](#auto-sharding-system)
+9. [Quality Assurance](#quality-assurance)
+10. [Operational Modes](#operational-modes)
 ---
-## System Overview
+## System Philosophy
-The BullTrackers Computation System is a **dependency-aware, auto-sharding, distributed calculation engine** designed to process massive datasets across user portfolios, trading histories, market insights, and price data. The system automatically handles:
+The BullTrackers Computation System is a **dependency-aware, distributed calculation engine** that processes massive financial datasets with strict guarantees:
-- **Smart data loading** (only loads what's needed)
-- **Transparent sharding** (handles Firestore's 1MB document limit)
-- **Dependency injection** (calculations receive exactly what they declare)
-- **Historical state management** (access to yesterday's data when needed)
-- **Incremental recomputation** (only reruns when code or dependencies change)
+- **Incremental Recomputation**: Only re-runs when code or dependencies change
+- **Historical Continuity**: Ensures chronological execution for time-series calculations
+- **Transparent Sharding**: Handles Firestore's 1MB document limit automatically
+- **Data Availability Gating**: Never runs when source data is missing
+- **Cascading Invalidation**: Upstream changes automatically invalidate downstream results
+### Key Design Principles
+**1. Source of Truth Paradigm**
+- The Root Data Indexer creates a daily availability manifest
+- Computations are gated by data availability checks
+- Missing data triggers "IMPOSSIBLE" states, not retries
+**2. Merkle Tree Dependency Hashing**
+- Every calculation has a hash that includes:
+  - Its own source code
+  - Hashes of math layers it uses
+  - Hashes of calculations it depends on
+- Changes cascade: updating calculation A invalidates all dependents
+**3. Stateless Execution**
+- Each worker receives complete context for its task
+- No inter-worker communication
+- Infinitely horizontally scalable
 ---
-## Context Architecture
+## Root Data Indexer
+### Purpose
+The Root Data Indexer runs daily to scan all data sources and create a **centralized availability manifest**. This prevents the computation system from attempting to run calculations when source data doesn't exist.
+### Architecture
+```
+┌──────────────────────────────────────────────────────┐
+│           Root Data Indexer (Daily Scan)             │
+├──────────────────────────────────────────────────────┤
+│  For each date (2023-01-01 → Tomorrow):              │
+│    1. Check Normal User Portfolios (Canary: 19M)     │
+│    2. Check Speculator Portfolios (Canary: 19M)      │
+│    3. Check Normal Trade History (Canary: 19M)       │
+│    4. Check Speculator Trade History (Canary: 19M)   │
+│    5. Check Daily Insights (/{date})                 │
+│    6. Check Social Posts (/{date}/posts)             │
+│    7. Pre-Load Price Shard (shard_0) → Date Map      │
+│                                                        │
+│  Output: /system_root_data_index/{date}              │
+│    {                                                   │
+│      hasPortfolio: true,                              │
+│      hasHistory: false,                               │
+│      hasInsights: true,                               │
+│      hasSocial: true,                                 │
+│      hasPrices: true,                                 │
+│      details: {                                        │
+│        normalPortfolio: true,                         │
+│        speculatorPortfolio: false,                    │
+│        normalHistory: false,                          │
+│        speculatorHistory: false                       │
+│      }                                                 │
+│    }                                                   │
+└──────────────────────────────────────────────────────┘
+```
+### Canary Block System
+Instead of scanning every user block (which would be prohibitively expensive), the indexer uses **representative blocks**:
+- **Block 19M**: Statistically verified to always contain data when the system is healthy
+- **Part 0**: First part of sharded collections
+**Logic**: If Block 19M has data for a date, all blocks have data for that date. This is a deliberate architectural assumption that reduces indexing cost by 99%.
+### Granular UserType Tracking
+The indexer distinguishes between:
+- **Normal Users**: Traditional portfolio tracking (AggregatedPositions)
+- **Speculators**: Advanced traders (PublicPositions with leverage/SL/TP)
-### The Context Object
+This enables calculations to declare: `userType: 'speculator'` and be gated only when speculator data exists.
-Every computation receives a **context object** that contains all the data and tools it needs. The context is built dynamically based on the computation's declared dependencies.
+### Price Data Optimization
-### Context Structure by Computation Type
+Price data is handled differently:
+1. **Pre-Load Once**: The entire `shard_0` document is loaded into memory
+2. **Extract Date Keys**: All dates with price data are extracted into a `Set`
+3. **Fast Lookup**: Each date check becomes O(1) instead of a Firestore read
+This reduces price availability checks from **~1000 reads/day** to **1 read total**.
+### Index Schema
-#### **Standard (Per-User) Context**
 ```javascript
 {
-  user: {
-    id: "user_123",
-    type: "speculator",  // or "normal"
-    portfolio: {
-      today: { /* Portfolio snapshot for today */ },
-      yesterday: { /* Portfolio snapshot for yesterday (if isHistorical: true) */ }
-    },
-    history: {
-      today: { /* Trading history for today */ },
-      yesterday: { /* Trading history for yesterday (if needed) */ }
-    }
-  },
-  date: {
-    today: "2024-12-07"
-  },
-  insights: {
-    today: { /* Daily instrument insights */ },
-    yesterday: { /* Yesterday's insights (if needed) */ }
-  },
-  social: {
-    today: { /* Social post insights */ },
-    yesterday: { /* Yesterday's social data (if needed) */ }
-  },
-  mappings: {
-    tickerToInstrument: { "AAPL": 123, ... },
-    instrumentToTicker: { 123: "AAPL", ... }
-  },
-  math: {
-    // All mathematical layers (extractors, primitives, signals, etc.)
-    extract: DataExtractor,
-    compute: MathPrimitives,
-    signals: SignalPrimitives,
-    // ... and more
-  },
-  computed: {
-    // Results from dependency calculations (current day)
-    "risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
-    "sentiment-score": { "AAPL": { score: 0.8 }, ... }
-  },
-  previousComputed: {
-    // Results from dependency calculations (previous day, if isHistorical: true)
-    "risk-metrics": { "AAPL": { volatility: 0.23 }, ... }
-  },
-  meta: { /* Calculation metadata */ },
-  config: { /* System configuration */ },
-  deps: { /* System dependencies (db, logger, etc.) */ }
+  date: "2024-12-07",
+  lastUpdated: Timestamp,
+  // Aggregate Flags (true if ANY subtype exists)
+  hasPortfolio: true,   // normalPortfolio OR speculatorPortfolio
+  hasHistory: false,    // normalHistory OR speculatorHistory
+  hasInsights: true,    // Insights document exists
+  hasSocial: true,      // At least 1 social post exists
+  hasPrices: true,      // Price data exists for this date
+  // Granular Breakdown
+  details: {
+    normalPortfolio: true,
+    speculatorPortfolio: false,
+    normalHistory: false,
+    speculatorHistory: false
+  }
 }
 ```
-#### **Meta (Once-Per-Day) Context**
+### Availability Check Logic (Computation System)
 ```javascript
-{
-  date: {
-    today: "2024-12-07"
-  },
-  insights: {
-    today: { /* Daily instrument insights */ },
-    yesterday: { /* If needed */ }
-  },
-  social: {
-    today: { /* Social post insights */ },
-    yesterday: { /* If needed */ }
-  },
-  prices: {
-    history: {
-      // Price data for all instruments (or batched shards)
-      "123": {
-        ticker: "AAPL",
-        prices: {
-          "2024-12-01": 150.25,
-          "2024-12-02": 151.30,
-          // ...
-        }
-      }
-    }
-  },
-  mappings: { /* Same as Standard */ },
-  math: { /* Same as Standard */ },
-  computed: { /* Same as Standard */ },
-  previousComputed: { /* Same as Standard */ },
-  meta: { /* Calculation metadata */ },
-  config: { /* System configuration */ },
-  deps: { /* System dependencies */ }
+// AvailabilityChecker.js - checkRootDependencies()
+if (calculation.rootDataDependencies includes 'portfolio') {
+  if (calculation.userType === 'speculator') {
+    if (!rootDataStatus.speculatorPortfolio) → MISSING
+  } else if (calculation.userType === 'normal') {
+    if (!rootDataStatus.normalPortfolio) → MISSING
+  } else {
+    // userType: 'all' or 'aggregate'
+    if (!rootDataStatus.hasPortfolio) → MISSING
+  }
 }
+// Similar logic for 'history' dependency
+// Global data types (insights, social, price) have no subtypes
 ```
-### How Context is Auto-Populated
+**Critical Behavior**:
+- If data is missing for a **historical date** → Mark calculation as `IMPOSSIBLE` (permanent failure)
+- If data is missing for **today's date** → Mark as `BLOCKED` (retriable, data may arrive later)
+---
+## Computation Architecture
-The system uses a **declaration-based approach**. When you define a calculation, you declare what data you need:
+### Calculation Types
+#### **Standard (Per-User) Computations**
 ```javascript
-class MyCalculation {
+class UserRiskProfile {
   static getMetadata() {
     return {
-      type: 'standard',  // or 'meta'
-      isHistorical: true,  // Do I need yesterday's data?
-      rootDataDependencies: ['portfolio', 'insights'],  // What root data do I need?
-      userType: 'all'  // 'all', 'speculator', or 'normal'
+      type: 'standard',                    // Runs once per user
+      category: 'risk-management',         // Storage category
+      isHistorical: true,                  // Needs yesterday's data
+      rootDataDependencies: ['portfolio', 'history'],
+      userType: 'speculator'               // Only for speculator users
     };
   }
   static getDependencies() {
-    return ['risk-metrics', 'sentiment-score'];  // What other calculations do I depend on?
+    return ['market-volatility'];          // Needs this calc to run first
+  }
+  async process(context) {
+    const { user, computed, math } = context;
+    // Process individual user
+    this.results[user.id] = { riskScore: /* ... */ };
   }
 }
 ```
-The `ContextBuilder` then:
-1. **Checks `rootDataDependencies`** → Loads portfolio, insights, social, history, or price data
-2. **Checks `isHistorical`** → If true, loads yesterday's portfolio and previous computation results
-3. **Checks `getDependencies()`** → Fetches results from other calculations
-4. **Injects math layers** → Automatically includes all extractors, primitives, and utilities
-5. **Adds mappings** → Provides ticker ↔ instrument ID conversion
-**You only get what you ask for.** This keeps memory usage efficient and prevents unnecessary data loading.
+**Execution Model**:
+- Portfolio data streams in batches (50 users at a time)
+- Each user processed independently
+- Memory-efficient for millions of users
----
-## Data Loading & Routing
-### The Data Loading Pipeline
-```
-┌─────────────────────────────────────────────────────────────┐
-│                     DataLoader (Cached)                      │
-├─────────────────────────────────────────────────────────────┤
-│  • loadMappings()         → Ticker/Instrument maps          │
-│  • loadInsights(date)     → Daily instrument insights       │
-│  • loadSocial(date)       → Social post insights            │
-│  • loadPriceShard(ref)    → Asset price data                │
-│  • getPriceShardRefs()    → All price shards                │
-│  • getSpecificPriceShardReferences(ids) → Targeted shards   │
-└─────────────────────────────────────────────────────────────┘
-                              ↓
-┌─────────────────────────────────────────────────────────────┐
-│                  ComputationExecutor                         │
-├─────────────────────────────────────────────────────────────┤
-│  • executePerUser()       → Streams portfolio data          │
-│  • executeOncePerDay()    → Loads global/meta data          │
-└─────────────────────────────────────────────────────────────┘
-                              ↓
-┌─────────────────────────────────────────────────────────────┐
-│                    ContextBuilder                            │
-├─────────────────────────────────────────────────────────────┤
-│  Assembles context based on metadata & dependencies         │
-└─────────────────────────────────────────────────────────────┘
-                              ↓
-                    Your Calculation.process()
-```
-### Streaming vs Batch Loading
-#### **Standard Computations: Streaming**
-Standard (per-user) computations use **streaming** to process users in chunks:
+#### **Meta (Once-Per-Day) Computations**
 ```javascript
-// System streams portfolio data in batches of 50 users
-for await (const userBatch of streamPortfolioData()) {
-  // Each batch is processed in parallel
-  for (const [userId, portfolio] of Object.entries(userBatch)) {
-    const context = buildPerUserContext({ userId, portfolio, ... });
-    await calculation.process(context);
+class MarketMomentum {
+  static getMetadata() {
+    return {
+      type: 'meta',                        // Runs once per day globally
+      category: 'market-signals',
+      rootDataDependencies: ['price', 'insights']
+    };
+  }
+  async process(context) {
+    const { prices, insights, math } = context;
+    // Process all tickers
+    for (const [instId, data] of Object.entries(prices.history)) {
+      this.results[data.ticker] = { momentum: /* ... */ };
+    }
   }
 }
 ```
-**Why streaming?**
-- Portfolio data is sharded across multiple documents
-- Loading all users at once would exceed memory limits
-- Streaming allows processing millions of users efficiently
+**Execution Model**:
+- Loads all price data (or processes in shard batches)
+- Runs once, produces global results
+- Used for market-wide analytics
-#### **Meta Computations: Batch or Shard**
-Meta computations have two modes:
+### Manifest Builder
-1. **Standard Meta** (No price dependency):
-   ```javascript
-   const context = buildMetaContext({ insights, social, ... });
-   await calculation.process(context);
-   ```
+The Manifest Builder automatically:
-2. **Price-Dependent Meta** (Batched Shard Processing):
-   ```javascript
-   // System loads price data in shard batches
-   for (const shardRef of priceShardRefs) {
-     const shardData = await loadPriceShard(shardRef);
-     const context = buildMetaContext({ prices: { history: shardData } });
-     await calculation.process(context);
-     // Memory is cleared between shards
-   }
-   ```
+1. **Discovers** all calculation classes in the codebase
+2. **Analyzes** their dependencies
+3. **Sorts** them topologically (builds a DAG)
+4. **Assigns** pass numbers (execution waves)
+5. **Generates** smart hashes for each calculation
----
-## Sharding System
+```
+Pass 1 (No Dependencies):
+  - market-volatility
+  - price-momentum
+Pass 2 (Depends on Pass 1):
+  - user-risk-profile (needs market-volatility)
+  - sentiment-score (needs price-momentum)
+Pass 3 (Depends on Pass 2):
+  - combined-signal (needs sentiment-score + user-risk-profile)
+```
-### The Problem: Firestore's 1MB Limit
+**Circular Dependency Detection**: If `A → B → C → A`, the builder throws a fatal error and refuses to generate a manifest.
-Firestore has a **1MB hard limit** per document. When computation results contain thousands of tickers (e.g., momentum scores for every asset), the document exceeds this limit.
+---
-### The Solution: Auto-Sharding
+## Context & Dependency Injection
-The system **automatically detects** when a result is too large and splits it into a subcollection.
+### Context Structure
-### How Auto-Sharding Works
+Every calculation receives exactly what it declares:
 ```javascript
-// When saving results:
-const result = {
-  "AAPL": { score: 0.8, volatility: 0.25 },
-  "GOOGL": { score: 0.7, volatility: 0.22 },
-  // ... 5,000+ tickers
-};
-// System calculates size:
-const totalSize = calculateFirestoreBytes(result);  // ~1.2 MB
-// IF size > 900KB (safety threshold):
-// 1. Splits data into chunks < 900KB each
-// 2. Writes chunks to: /results/{date}/{category}/{calc}/_shards/shard_0
-//                      /results/{date}/{category}/{calc}/_shards/shard_1
-// 3. Writes pointer: /results/{date}/{category}/{calc}
-//    → { _sharded: true, _shardCount: 2, _completed: true }
-// IF size < 900KB:
-// Writes normally: /results/{date}/{category}/{calc}
-//    → { "AAPL": {...}, "GOOGL": {...}, _completed: true, _sharded: false }
+{
+  // IDENTITY (Standard only)
+  user: {
+    id: "user_123",
+    type: "speculator",
+    portfolio: { today: {...}, yesterday: {...} },
+    history: { today: {...}, yesterday: {...} }
+  },
+  // TEMPORAL
+  date: { today: "2024-12-07" },
+  // ROOT DATA (if declared)
+  insights: { today: {...}, yesterday: {...} },
+  social: { today: {...}, yesterday: {...} },
+  prices: { history: {...} },
+  // MAPPINGS
+  mappings: {
+    tickerToInstrument: { "AAPL": 123 },
+    instrumentToTicker: { 123: "AAPL" }
+  },
+  // MATH LAYERS (always injected)
+  math: {
+    extract: DataExtractor,
+    compute: MathPrimitives,
+    signals: SignalPrimitives,
+    history: HistoryExtractor,
+    insights: InsightsExtractor,
+    priceExtractor: priceExtractor,
+    // ... 20+ utility classes
+  },
+  // DEPENDENCIES (if declared)
+  computed: {
+    "market-volatility": { "AAPL": { volatility: 0.25 } }
+  },
+  // HISTORICAL DEPENDENCIES (if isHistorical: true)
+  previousComputed: {
+    "market-volatility": { "AAPL": { volatility: 0.23 } }
+  }
+}
 ```
-### Reading Sharded Data
+### Lazy Loading Optimization
-The system **transparently reassembles** sharded data when loading dependencies:
+The system **only loads what you declare**:
 ```javascript
-// When loading a dependency:
-const result = await fetchExistingResults(dateStr, ['momentum-score']);
-// System checks: Is this document sharded?
-if (doc.data()._sharded === true) {
-  // 1. Fetch all docs from _shards subcollection
-  // 2. Merge them back into a single object
-  // 3. Return as if it was never sharded
-}
+// Calculation A declares:
+rootDataDependencies: ['portfolio']
-// Your calculation receives complete data, regardless of storage method
-```
-### Handling Mixed Storage Scenarios
+// Context A receives:
+{ user: { portfolio: {...} } }  // No insights, social, or prices loaded
-**Question:** What if I need data from 2 days, where Day 1 is sharded and Day 2 is not?
-**Answer:** The system handles this automatically:
+// Calculation B declares:
+rootDataDependencies: ['portfolio', 'insights']
-```javascript
-// Your calculation declares:
-static getMetadata() {
-  return {
-    isHistorical: true,  // I need yesterday's data
-    // ...
-  };
+// Context B receives:
+{
+  user: { portfolio: {...} },
+  insights: { today: {...} }     // Insights fetched on-demand
 }
+```
+This prevents unnecessary Firestore reads and keeps memory usage minimal.
-// System loads BOTH days:
-const computed = await fetchExistingResults(todayDate, ['momentum-score']);
-// → Auto-detects if sharded, reassembles if needed
+---
-const previousComputed = await fetchExistingResults(yesterdayDate, ['momentum-score']);
-// → Auto-detects if sharded, reassembles if needed
+## Execution Pipeline
+### Phase 1: Analysis & Dispatch
+```
+┌─────────────────────────────────────────────────────┐
+│         Computation Dispatcher                      │
+│  (Smart Pre-Flight Checker)                         │
+├─────────────────────────────────────────────────────┤
+│  For each date in range:                            │
+│    1. Fetch Root Data Index                         │
+│    2. Fetch Computation Status (stored hashes)      │
+│    3. Fetch Yesterday's Status (historical check)   │
+│    4. Run analyzeDateExecution()                    │
+│                                                       │
+│  Decision Logic per Calculation:                    │
+│    ├─ Root Data Missing?                            │
+│    │   ├─ Historical Date → Mark IMPOSSIBLE         │
+│    │   └─ Today's Date → Mark BLOCKED (retriable)   │
+│    │                                                  │
+│    ├─ Dependency Impossible?                        │
+│    │   └─ Mark IMPOSSIBLE (cascading failure)       │
+│    │                                                  │
+│    ├─ Dependency Missing/Hash Mismatch?             │
+│    │   └─ Mark BLOCKED (wait for dependency)        │
+│    │                                                  │
+│    ├─ Historical Continuity Broken?                 │
+│    │   └─ Mark BLOCKED (wait for yesterday)         │
+│    │                                                  │
+│    ├─ Hash Mismatch?                                │
+│    │   └─ Mark RUNNABLE (re-run needed)             │
+│    │                                                  │
+│    └─ Hash Match?                                   │
+│        └─ Mark SKIPPED (up-to-date)                 │
+│                                                       │
+│  5. Create Audit Ledger (PENDING state)             │
+│  6. Publish RUNNABLE tasks to Pub/Sub               │
+└─────────────────────────────────────────────────────┘
+```
+### Phase 2: Worker Execution
+```
+┌─────────────────────────────────────────────────────┐
+│         Computation Worker                          │
+│  (Processes Single Task)                            │
+├─────────────────────────────────────────────────────┤
+│  1. Parse Pub/Sub Message                           │
+│     { date, pass, computation, previousCategory }   │
+│                                                       │
+│  2. Load Manifest (cached in memory)                │
+│                                                       │
+│  3. Fetch Dependencies                              │
+│     - Load dependency results (auto-reassemble)     │
+│     - Load previous day's results (if historical)   │
+│                                                       │
+│  4. Execute Calculation                             │
+│     ├─ Standard: Stream users in batches            │
+│     └─ Meta: Load global data / price shards        │
+│                                                       │
+│  5. Validate Results (HeuristicValidator)           │
+│     - NaN Detection                                 │
+│     - Flatline Detection (stuck values)             │
+│     - Null/Empty Analysis                           │
+│     - Dead Object Detection                         │
+│                                                       │
+│  6. Store Results                                   │
+│     ├─ Calculate size                               │
+│     ├─ If > 900KB → Auto-shard                      │
+│     └─ Write to Firestore                           │
+│                                                       │
+│  7. Update Status & Ledgers                         │
+│     ├─ computation_status/{date} → New hash         │
+│     ├─ Audit Ledger → COMPLETED                     │
+│     └─ Run History → SUCCESS/FAILURE record         │
+│                                                       │
+│  8. Category Migration (if detected)                │
+│     └─ Delete old category's data                   │
+└─────────────────────────────────────────────────────┘
+```
+### Error Handling Stages
+The system tracks **where** failures occur:
-// Context now has both:
+```javascript
+// Run History Schema
 {
-  computed: { "momentum-score": { /* today's data, reassembled if sharded */ } },
-  previousComputed: { "momentum-score": { /* yesterday's data, reassembled if sharded */ } }
+  status: "FAILURE" | "SUCCESS" | "CRASH",
+  error: {
+    message: "...",
+    stage: "EXECUTION" | "PREPARE_SHARDS" | "COMMIT_BATCH" |
+           "SHARDING_LIMIT_EXCEEDED" | "QUALITY_CIRCUIT_BREAKER" |
+           "MANIFEST_LOAD" | "SYSTEM_CRASH"
+  }
 }
 ```
-You never need to know or care whether data is sharded. The system guarantees you receive complete, reassembled data.
+**Stage-Specific Handling**:
+- `QUALITY_CIRCUIT_BREAKER`: Block deployment, data integrity issue
+- `SHARDING_LIMIT_EXCEEDED`: Firestore hard limit hit, needs redesign
+- `SYSTEM_CRASH`: Infrastructure issue, retriable
+- `EXECUTION`: Logic bug in calculation code
 ---
-## Computation Types & Execution
-### Standard Computations (`type: 'standard'`)
-**Purpose:** Per-user calculations (risk profiles, P&L analysis, behavioral scoring)
+## Smart Hashing & Versioning
-**Execution:**
-- Runs **once per user** per day
-- Receives individual user portfolio and history
-- Streams data in batches for memory efficiency
+### Hash Composition
-**Example:**
 ```javascript
-class UserRiskProfile {
-  static getMetadata() {
-    return {
-      type: 'standard',
-      rootDataDependencies: ['portfolio', 'history'],
-      userType: 'speculator'
-    };
-  }
-  async process(context) {
-    const { user, math } = context;
-    const portfolio = user.portfolio.today;
-    const positions = math.extract.getPositions(portfolio, user.type);
-    // Calculate risk per user
-    this.results[user.id] = { riskScore: /* ... */ };
+// Step 1: Intrinsic Hash (Code + System Epoch)
+const codeHash = SHA256(calculation.toString());
+const intrinsicHash = SHA256(codeHash + "|EPOCH:v1.0-epoch-2");
+// Step 2: Layer Hashing (Dynamic Detection)
+let compositeHash = intrinsicHash;
+for (const [layer, exports] of MATH_LAYERS) {
+  for (const [exportName, triggerPatterns] of exports) {
+    if (codeString.includes(exportName)) {
+      compositeHash += layerHashes[layer][exportName];
+    }
   }
 }
+// Step 3: Dependency Hashing (Merkle Tree)
+const depHashes = dependencies.map(dep => dep.hash).join('|');
+const finalHash = SHA256(compositeHash + "|DEPS:" + depHashes);
 ```
-**Result Structure:**
+### Cascading Invalidation Example
+```
+Initial State:
+  PriceVolatility    → hash: abc123
+  UserRisk (uses PV) → hash: def456 (includes abc123)
+  Signal (uses UR)   → hash: ghi789 (includes def456)
+Developer Updates PriceVolatility:
+  PriceVolatility    → hash: xyz000 (NEW!)
+  UserRisk           → hash: uvw111 (NEW! Dependency changed)
+  Signal             → hash: rst222 (NEW! Cascade)
+Next Dispatch:
+  All 3 calculations marked RUNNABLE (hash mismatch)
+```
+### System Epoch
+`system_epoch.js`:
 ```javascript
-{
-  "user_123": { riskScore: 0.75 },
-  "user_456": { riskScore: 0.45 },
-  // ... millions of users
-}
+module.exports = "v1.0-epoch-2";
 ```
-### Meta Computations (`type: 'meta'`)
+**Purpose**: Changing this string forces **global re-computation** of all calculations, even if code hasn't changed. Used for:
+- Schema migrations
+- Critical bug fixes requiring historical reprocessing
+- Firestore structure changes
+---
-**Purpose:** Platform-wide calculations (aggregate metrics, market analysis, global trends)
+## Data Loading & Streaming
-**Execution:**
-- Runs **once per day** (not per user)
-- Processes all data holistically
-- Can access price history for all instruments
+### Streaming Architecture (Standard Computations)
-**Example:**
 ```javascript
-class MarketMomentum {
-  static getMetadata() {
-    return {
-      type: 'meta',
-      rootDataDependencies: ['price', 'insights']
-    };
-  }
+// Problem: 10M users × 5KB portfolio = 50GB
+// Solution: Stream in chunks
+async function* streamPortfolioData(dateStr, refs) {
+  const BATCH_SIZE = 50; // 50 users at a time
-  async process(context) {
-    const { prices, insights, math } = context;
+  for (let i = 0; i < refs.length; i += BATCH_SIZE) {
+    const batchRefs = refs.slice(i, i + BATCH_SIZE);
+    const userData = await loadDataByRefs(batchRefs);
-    // Calculate momentum for every ticker
-    for (const [instId, data] of Object.entries(prices.history)) {
-      const ticker = data.ticker;
-      const priceData = math.priceExtractor.getHistory(prices, ticker);
-      this.results[ticker] = { momentum: /* ... */ };
-    }
+    yield userData; // { user1: {...}, user2: {...}, ... }
+    // Memory cleared after each iteration
   }
 }
-```
-**Result Structure:**
-```javascript
-{
-  "AAPL": { momentum: 0.65 },
-  "GOOGL": { momentum: 0.82 },
-  // ... all tickers
+// Usage in Executor
+for await (const userBatch of streamPortfolioData(date, refs)) {
+  for (const [userId, portfolio] of Object.entries(userBatch)) {
+    const context = buildContext({ userId, portfolio, ... });
+    await calculation.process(context);
+  }
 }
 ```
-### Price-Dependent Meta Computations
-When a meta computation declares `rootDataDependencies: ['price']`, it enters **batched shard processing mode**:
+### Price Data Batching (Meta Computations)
 ```javascript
-// Instead of loading ALL price data at once (would crash):
+// Problem: 10,000 tickers × 2 years history = 1GB
+// Solution: Process shards sequentially
 for (const shardRef of priceShardRefs) {
-  const shardData = await loadPriceShard(shardRef);  // ~50-100 instruments per shard
+  const shardData = await loadPriceShard(shardRef); // ~100 tickers
   const context = buildMetaContext({
-    prices: { history: shardData }  // Only this shard's data
+    prices: { history: shardData }
   });
   await calculation.process(context);
   // Results accumulate across shards
-  // Memory is cleared between iterations
+  // Memory cleared between shards
 }
 ```
-**Your calculation receives partial data** and processes it incrementally. The system ensures all shards are eventually processed.
----
-## Dependency Management
+### Smart Shard Indexing (Optimization)
-### Declaring Dependencies
+For targeted price lookups (e.g., "only calculate momentum for AAPL, GOOGL, MSFT"):
 ```javascript
-static getDependencies() {
-  return ['risk-metrics', 'sentiment-score', 'momentum-analysis'];
-}
+// Without Indexing: Load ALL shards, filter after
+// Cost: 100+ Firestore reads
+// With Indexing: Pre-map which shard contains each instrument
+const index = {
+  "123": "shard_0",  // AAPL
+  "456": "shard_2",  // GOOGL
+  "789": "shard_0"   // MSFT
+};
+const relevantShards = ["shard_0", "shard_2"];
+// Cost: 2 Firestore reads
 ```
-This tells the system: "Before you run me, make sure these 3 calculations have completed."
+**Index Building**: Runs once, cached in `/system_metadata/price_shard_index`.
-### How Dependencies Are Loaded
+---
-When your calculation runs:
+## Auto-Sharding System
-1. System fetches results from all declared dependencies
-2. Checks if data is sharded → reassembles if needed
-3. Injects into `context.computed`:
+### The 1MB Problem
-```javascript
-{
-  computed: {
-    "risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
-    "sentiment-score": { "AAPL": { score: 0.8 }, ... },
-    "momentum-analysis": { "AAPL": { momentum: 0.65 }, ... }
-  }
-}
-```
+Firestore's hard limit: **1MB per document**. A calculation producing results for 5,000 tickers easily exceeds this.
-### Accessing Dependency Results
+### Transparent Sharding Solution
 ```javascript
-async process(context) {
-  const { computed, math } = context;
+// ResultCommitter.js - prepareAutoShardedWrites()
+const totalSize = calculateFirestoreBytes(result);
+if (totalSize < 900KB) {
+  // Write normally
+  /results/{date}/{category}/{calc}
+    → { "AAPL": {...}, "GOOGL": {...}, _completed: true }
-  // Access results from dependencies
-  const volatility = math.signals.getMetric(
-    computed,
-    'risk-metrics',
-    'AAPL',
-    'volatility'
-  );
+} else {
+  // Auto-shard
+  // Step 1: Split into chunks < 900KB each
+  const chunks = splitIntoChunks(result);
-  const sentiment = math.signals.getMetric(
-    computed,
-    'sentiment-score',
-    'AAPL',
-    'score'
-  );
+  // Step 2: Write shards
+  /results/{date}/{category}/{calc}/_shards/shard_0 → chunk 1
+  /results/{date}/{category}/{calc}/_shards/shard_1 → chunk 2
+  /results/{date}/{category}/{calc}/_shards/shard_N → chunk N
-  // Use them in your calculation
-  const combinedScore = volatility * sentiment;
+  // Step 3: Write pointer
+  /results/{date}/{category}/{calc}
+    → { _sharded: true, _shardCount: N, _completed: true }
 }
 ```
-### Historical Dependencies
-If your calculation needs **yesterday's dependency results**:
+### Transparent Reassembly
 ```javascript
-static getMetadata() {
-  return {
-    isHistorical: true,  // ← Enable historical mode
-    // ...
-  };
-}
+// DependencyFetcher.js - fetchExistingResults()
-async process(context) {
-  const { computed, previousComputed } = context;
-  // Today's risk
-  const todayRisk = computed['risk-metrics']['AAPL'].volatility;
+const doc = await docRef.get();
+const data = doc.data();
+if (data._sharded === true) {
+  // 1. Fetch all shards
+  const shardsCol = docRef.collection('_shards');
+  const snapshot = await shardsCol.get();
-  // Yesterday's risk
-  const yesterdayRisk = previousComputed['risk-metrics']['AAPL'].volatility;
+  // 2. Merge back into single object
+  const assembled = {};
+  snapshot.forEach(shard => {
+    Object.assign(assembled, shard.data());
+  });
-  // Calculate change
-  const riskChange = todayRisk - yesterdayRisk;
+  // 3. Return as if never sharded
+  return assembled;
 }
+// Normal path: return as-is
+return data;
 ```
----
+**Developer Experience**: You **never** know or care whether data is sharded. Read and write as if documents have no size limit.
+### Sharding Limits
-## Versioning & Smart Hashing
+**Maximum Calculation Size**: ~450MB (500 shards × 900KB)
+If a calculation exceeds this, the system throws:
+```
+error: {
+  stage: "SHARDING_LIMIT_EXCEEDED",
+  message: "Firestore subcollection limit reached"
+}
+```
-### The Problem: When to Recompute?
+**Solution**: Refactor calculation to produce less data or split into multiple calculations.
-If you fix a bug in a calculation, how does the system know to re-run it for all past dates?
+---
-### The Solution: Merkle Tree Dependency Hashing
+## Quality Assurance
-Every calculation gets a **smart hash** that includes:
+### HeuristicValidator (Grey Box Testing)
-1. **Its own code** (SHA-256 of the class definition)
-2. **Layer dependencies** (Hashes of math layers it uses)
-3. **Calculation dependencies** (Hashes of calculations it depends on)
+Runs statistical analysis on results **before storage**:
 ```javascript
-// Example hash composition:
-const intrinsicHash = hash(calculation.toString() + layerHashes);
-const dependencyHashes = dependencies.map(dep => dep.hash).join('|');
-const finalHash = hash(intrinsicHash + '|DEPS:' + dependencyHashes);
+// ResultsValidator.js
-// Result: "a3f9c2e1..." (SHA-256)
-```
+1. NaN Detection
+   - Scans sample of results for NaN/Infinity
+   - Threshold: 0% (strict, NaN is always a bug)
-### Cascading Invalidation
+2. Flatline Detection
+   - Checks if >95% of values are identical
+   - Catches stuck loops or broken RNG
-If **Calculation A** changes, **Calculation B** (which depends on A) automatically gets a new hash:
+3. Null/Empty Analysis
+   - Threshold: 90% of results are null/0
+   - Indicates data pipeline failure
+4. Dead Object Detection
+   - Finds objects where all properties are null/0
+   - Example: { profile: [], score: 0, signal: null }
+5. Vector Emptiness (Distribution Calcs)
+   - Checks if histogram/profile arrays are empty
+   - Threshold: 90% empty → FAIL
 ```
-Risk Metrics (v1) → hash: abc123
-                      ↓
-Sentiment Score     → hash: def456 (includes abc123)
-(depends on Risk)
-```
-If you update Risk Metrics:
+**Circuit Breaker**: If validation fails, the calculation **does not store results** and is marked as `FAILURE` with stage: `QUALITY_CIRCUIT_BREAKER`.
+### Validation Overrides
+For legitimately sparse datasets:
+```javascript
+// validation_overrides.js
+module.exports = {
+  "bankruptcy-detector": {
+    maxZeroPct: 100  // Rare event, 100% zeros is expected
+  },
+  "earnings-surprise": {
+    maxNullPct: 99   // Only runs on earnings days
+  }
+};
 ```
-Risk Metrics (v2) → hash: xyz789 (NEW!)
-                      ↓
-Sentiment Score     → hash: ghi012 (NEW! Because dependency changed)
+### Build Reporter (Pre-Deployment Analysis)
+```bash
+npm run build-reporter
 ```
-### Recomputation Logic
+Generates a **simulation report** without running calculations:
-For each date, the system checks:
+```
+Build Report: v1.2.5_2024-12-07
+================================
-```javascript
-// Stored in Firestore:
-computationStatus['2024-12-07'] = {
-  'risk-metrics': 'abc123',      // Last run hash
-  'sentiment-score': 'def456'
-};
+Summary:
+  - 1,245 Re-Runs (hash mismatch)
+  - 23 New Calculations
+  - 0 Impossible
+  - 45 Blocked (waiting for data)
-// Current manifest:
-manifest['risk-metrics'].hash = 'xyz789';  // NEW HASH!
-manifest['sentiment-score'].hash = 'ghi012';
+Detailed Breakdown:
+2024-12-01:
+  Will Re-Run:
+    - user-risk-profile (Hash: abc123 → xyz789)
+    - sentiment-score (Hash: def456 → uvw012)
+  Blocked:
+    - social-sentiment (Missing Root Data: social)
-// Decision:
-// - Risk Metrics: Hash mismatch → RERUN
-// - Sentiment Score: Hash mismatch → RERUN (cascaded)
+2024-12-02:
+  Will Run:
+    - new-momentum-signal (New calculation)
 ```
-This ensures **incremental recomputation**: only changed calculations (and their dependents) re-run.
+**Use Case**: Review before deploying to production. If 10,000 re-runs are detected, investigate whether code change was intentional.
 ---
-## Execution Modes
+## Operational Modes
-### Mode 1: Legacy (Orchestrator)
-**Single-process execution** for all dates and calculations.
+### Mode 1: Local Orchestrator (Development)
 ```bash
+# Run all calculations for Pass 1 sequentially
 COMPUTATION_PASS_TO_RUN=1 npm run computation-orchestrator
 ```
+**Behavior**:
+- Single-process execution
 - Loads manifest
 - Iterates through all dates
-- Runs all calculations in Pass 1 sequentially
-- Good for: Development, debugging
+- Runs calculations in order
+- Good for: Debugging, local testing
 ### Mode 2: Dispatcher + Workers (Production)
-**Distributed execution** using Pub/Sub.
-#### Step 1: Dispatch Tasks
 ```bash
+# Step 1: Dispatch tasks to Pub/Sub
 COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher
-```
-Publishes messages to Pub/Sub:
-```json
-{
-  "action": "RUN_COMPUTATION_DATE",
-  "date": "2024-12-07",
-  "pass": "1"
-}
+# Step 2: Cloud Function workers consume tasks
+# (Auto-scaled by GCP, 0 to 1000+ workers)
 ```
-#### Step 2: Workers Consume Tasks
+**Behavior**:
+- Dispatcher analyzes all dates
+- Publishes ~10,000 messages to Pub/Sub
+- Workers process in parallel
+- Each worker handles 1 date
+- Auto-retries on failure (Pub/Sub built-in)
+**Scaling**: 1,000 dates × 3 calcs = 3,000 tasks. With 100 workers, completes in ~5 minutes.
+### Mode 3: Batch Price Executor (Optimization)
 ```bash
-# Cloud Function triggered by Pub/Sub
-# Or: Local consumer for testing
-npm run computation-worker
+# For price-dependent calcs, bulk-process historical data
+npm run batch-price-executor --dates=2024-12-01,2024-12-02 --calcs=momentum-signal
 ```
-Each worker:
-1. Receives a date + pass
-2. Loads manifest
-3. Runs calculations for that date only
-4. Updates status document
+**Behavior**:
+- Loads price shards once
+- Processes multiple dates in a single pass
+- Bypasses Pub/Sub overhead
+- **10x faster** for historical backfills
-**Benefits:**
-- Parallel execution (100+ workers)
-- Fault tolerance (failed dates retry automatically)
-- Scales to millions of dates
+**Use Case**: After deploying a new price-dependent calculation, backfill 2 years of history in 1 hour instead of 10.
-### Pass System
+---
-Calculations are grouped into **passes** based on dependencies:
+## Advanced Topics
-```
-Pass 1: Base calculations (no dependencies)
-  - risk-metrics
-  - price-momentum
+### Historical Continuity Enforcement
-Pass 2: Depends on Pass 1
-  - sentiment-score (needs risk-metrics)
-  - trend-analysis (needs price-momentum)
+For calculations that depend on their own previous results:
-Pass 3: Depends on Pass 2
-  - combined-signal (needs sentiment-score + trend-analysis)
+```javascript
+// Example: cumulative-pnl needs yesterday's cumulative-pnl
+static getMetadata() {
+  return { isHistorical: true };
+}
+// Dispatcher Logic:
+if (calculation.isHistorical) {
+  const yesterday = date - 1 day;
+  const yesterdayStatus = await fetchComputationStatus(yesterday);
+  if (!yesterdayStatus[calcName] ||
+      yesterdayStatus[calcName].hash !== currentHash) {
+    // Yesterday is missing or has wrong hash
+    report.blocked.push({
+      reason: "Waiting for historical continuity"
+    });
+  }
+}
 ```
-**You run passes sequentially:**
-```bash
-COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher  # Wait for completion
-COMPUTATION_PASS_TO_RUN=2 npm run computation-dispatcher  # Wait for completion
-COMPUTATION_PASS_TO_RUN=3 npm run computation-dispatcher
+**Result**: Historical calculations run in **strict chronological order**, never skipping days.
+### Category Migration System
+If a calculation's category changes:
+```javascript
+// Before: category: 'signals'
+// After:  category: 'risk-management'
+// System detects change:
+manifest.previousCategory = 'signals';
+// Worker executes:
+1. Runs calculation normally
+2. Stores in new category: /results/{date}/risk-management/{calc}
+3. Deletes old category: /results/{date}/signals/{calc}
 ```
-The manifest builder automatically assigns pass numbers via topological sort.
+**Automation**: Zero manual data migration needed.
+### Audit Ledger vs Run History
+**Audit Ledger** (`computation_audit_ledger/{date}/passes/{pass}/tasks/{calc}`):
+- Created **before** dispatch
+- Status: PENDING → COMPLETED
+- Purpose: Track which tasks were dispatched
+**Run History** (`computation_run_history/{date}/runs/{runId}`):
+- Created **after** execution attempt
+- Status: SUCCESS | FAILURE | CRASH
+- Purpose: Debug failures, track performance
+**Why Both?**: Audit Ledger answers "What should run?", Run History answers "What actually happened?".
 ---
@@ -657,60 +848,78 @@ The manifest builder automatically assigns pass numbers via topological sort.
 ### For a Standard Calculation
 ```
-1. Manifest Builder
-   ├─ Scans your calculation class
-   ├─ Generates smart hash (code + layers + dependencies)
-   ├─ Assigns to a pass based on dependency graph
-   └─ Validates all dependencies exist
+1. Root Data Indexer (Daily)
+   └─ Scans all data sources
+   └─ Creates availability manifest
-2. Dispatcher/Orchestrator
+2. Dispatcher (Per-Pass)
    ├─ Loads manifest
-   ├─ Iterates through all dates
-   └─ For each date:
-       ├─ Checks if calculation needs to run (hash mismatch?)
-       ├─ Checks if root data exists (portfolio, history, etc.)
-       └─ Dispatches task (or runs directly)
-3. Worker/Executor
-   ├─ Receives task for specific date
-   ├─ Loads dependency results (auto-reassembles if sharded)
+   ├─ For each date:
+   │   ├─ Checks root data availability
+   │   ├─ Checks dependency status
+   │   ├─ Checks historical continuity
+   │   └─ Decides: RUNNABLE | BLOCKED | IMPOSSIBLE
+   ├─ Creates Audit Ledger (PENDING)
+   └─ Publishes RUNNABLE tasks to Pub/Sub
+3. Worker (Per-Task)
+   ├─ Receives {date, pass, computation}
+   ├─ Loads manifest (cached)
+   ├─ Fetches dependencies (auto-reassembles shards)
    ├─ Streams portfolio data in batches
-   └─ For each user batch:
-       ├─ Builds per-user context
-       ├─ Injects math layers, mappings, computed dependencies
-       ├─ Calls your calculation.process(context)
-       └─ Accumulates results
-4. Result Committer
-   ├─ Calculates total result size
-   ├─ IF size > 900KB:
-   │   ├─ Splits into chunks
-   │   ├─ Writes to _shards subcollection
-   │   └─ Writes pointer document
-   └─ ELSE:
-       └─ Writes single document
-5. Status Updater
-   └─ Updates computation_status/{date} with new hash
+   ├─ For each user:
+   │   ├─ Builds context (dependency injection)
+   │   └─ Calls calculation.process(context)
+   ├─ Validates results (HeuristicValidator)
+   ├─ Auto-shards if > 900KB
+   ├─ Commits to Firestore
+   ├─ Updates status hash
+   ├─ Updates Audit Ledger → COMPLETED
+   └─ Records Run History → SUCCESS
+4. Next Pass
+   └─ Depends on results from this pass
 ```
 ### For a Meta Calculation
-Same as above, except:
-- **Step 3**: Loads all data once (or iterates through price shards)
-- **Context**: Global data, not per-user
-- **Result**: One document per date (e.g., all tickers' momentum scores)
+Same flow, except:
+- **Step 3**: Loads global data instead of streaming users
+- **Context**: No user object, prices/insights instead
+- **Result**: One document with all tickers' data
 ---
 ## Key Takeaways
-1. **Context is Auto-Built**: Declare what you need in metadata; the system handles the rest
-2. **Sharding is Transparent**: Read and write as if documents have no size limit
-3. **Dependencies Just Work**: Results are automatically fetched and reassembled
-4. **Versioning is Smart**: Change code → system knows what to rerun
-5. **Streaming is Automatic**: Standard computations stream data; you don't manage batches
-6. **Execution is Flexible**: Run locally for dev, distributed for production
+1. **Data Availability Gates Everything**: Computations never run when source data is missing
+2. **Smart Hashing Enables Incremental Updates**: Only changed calculations re-run
+3. **Sharding is Invisible**: Read/write as if documents have no size limit
+4. **Streaming Handles Scale**: Process millions of users without OOM
+5. **Quality Checks Prevent Bad Data**: Results validated before storage
+6. **Historical Continuity is Enforced**: Time-series calculations run in order
+7. **Distributed Execution Scales Infinitely**: 1 worker or 1,000 workers, same code
 ---
+## Operational Checklist
+**Daily (Automated)**:
+- ✅ Root Data Indexer runs at 2 AM UTC
+- ✅ Computation Dispatchers run for each pass (3 AM, 4 AM, 5 AM)
+- ✅ Workers auto-scale based on Pub/Sub queue depth
+**After Code Changes**:
+1. Run Build Reporter to preview impact
+2. Review re-run count (expected vs actual)
+3. Deploy to staging, run single date
+4. Validate results in Firestore
+5. Deploy to production
+6. Monitor Run History for failures
+**Debugging a Failure**:
+1. Check Run History for error stage
+2. If `QUALITY_CIRCUIT_BREAKER`: Data integrity issue, review validator logs
+3. If `EXECUTION`: Logic bug, reproduce locally with Orchestrator mode
+4. If `SYSTEM_CRASH`: Infrastructure issue, check Cloud Function logs
+5. Fix bug, redeploy, re-trigger specific pass