npm - bulltrackers-module - Versions diffs - 1.0.215 → 1.0.217 - Mend

bulltrackers-module 1.0.215 → 1.0.217

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/functions/computation-system/helpers/orchestration_helpers.js +176 -248
package/functions/computation-system/onboarding.md +716 -0
package/functions/computation-system/utils/utils.js +20 -106
package/package.json +1 -1

package/functions/computation-system/onboarding.md ADDED Viewed

@@ -0,0 +1,716 @@
+# BullTrackers Computation System - Comprehensive Onboarding Guide
+## Table of Contents
+1. [System Overview](#system-overview)
+2. [Context Architecture](#context-architecture)
+3. [Data Loading & Routing](#data-loading--routing)
+4. [Sharding System](#sharding-system)
+5. [Computation Types & Execution](#computation-types--execution)
+6. [Dependency Management](#dependency-management)
+7. [Versioning & Smart Hashing](#versioning--smart-hashing)
+8. [Execution Modes](#execution-modes)
+---
+## System Overview
+The BullTrackers Computation System is a **dependency-aware, auto-sharding, distributed calculation engine** designed to process massive datasets across user portfolios, trading histories, market insights, and price data. The system automatically handles:
+- **Smart data loading** (only loads what's needed)
+- **Transparent sharding** (handles Firestore's 1MB document limit)
+- **Dependency injection** (calculations receive exactly what they declare)
+- **Historical state management** (access to yesterday's data when needed)
+- **Incremental recomputation** (only reruns when code or dependencies change)
+---
+## Context Architecture
+### The Context Object
+Every computation receives a **context object** that contains all the data and tools it needs. The context is built dynamically based on the computation's declared dependencies.
+### Context Structure by Computation Type
+#### **Standard (Per-User) Context**
+```javascript
+{
+  user: {
+    id: "user_123",
+    type: "speculator",  // or "normal"
+    portfolio: {
+      today: { /* Portfolio snapshot for today */ },
+      yesterday: { /* Portfolio snapshot for yesterday (if isHistorical: true) */ }
+    },
+    history: {
+      today: { /* Trading history for today */ },
+      yesterday: { /* Trading history for yesterday (if needed) */ }
+    }
+  },
+  date: {
+    today: "2024-12-07"
+  },
+  insights: {
+    today: { /* Daily instrument insights */ },
+    yesterday: { /* Yesterday's insights (if needed) */ }
+  },
+  social: {
+    today: { /* Social post insights */ },
+    yesterday: { /* Yesterday's social data (if needed) */ }
+  },
+  mappings: {
+    tickerToInstrument: { "AAPL": 123, ... },
+    instrumentToTicker: { 123: "AAPL", ... }
+  },
+  math: {
+    // All mathematical layers (extractors, primitives, signals, etc.)
+    extract: DataExtractor,
+    compute: MathPrimitives,
+    signals: SignalPrimitives,
+    // ... and more
+  },
+  computed: {
+    // Results from dependency calculations (current day)
+    "risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
+    "sentiment-score": { "AAPL": { score: 0.8 }, ... }
+  },
+  previousComputed: {
+    // Results from dependency calculations (previous day, if isHistorical: true)
+    "risk-metrics": { "AAPL": { volatility: 0.23 }, ... }
+  },
+  meta: { /* Calculation metadata */ },
+  config: { /* System configuration */ },
+  deps: { /* System dependencies (db, logger, etc.) */ }
+}
+```
+#### **Meta (Once-Per-Day) Context**
+```javascript
+{
+  date: {
+    today: "2024-12-07"
+  },
+  insights: {
+    today: { /* Daily instrument insights */ },
+    yesterday: { /* If needed */ }
+  },
+  social: {
+    today: { /* Social post insights */ },
+    yesterday: { /* If needed */ }
+  },
+  prices: {
+    history: {
+      // Price data for all instruments (or batched shards)
+      "123": {
+        ticker: "AAPL",
+        prices: {
+          "2024-12-01": 150.25,
+          "2024-12-02": 151.30,
+          // ...
+        }
+      }
+    }
+  },
+  mappings: { /* Same as Standard */ },
+  math: { /* Same as Standard */ },
+  computed: { /* Same as Standard */ },
+  previousComputed: { /* Same as Standard */ },
+  meta: { /* Calculation metadata */ },
+  config: { /* System configuration */ },
+  deps: { /* System dependencies */ }
+}
+```
+### How Context is Auto-Populated
+The system uses a **declaration-based approach**. When you define a calculation, you declare what data you need:
+```javascript
+class MyCalculation {
+  static getMetadata() {
+    return {
+      type: 'standard',  // or 'meta'
+      isHistorical: true,  // Do I need yesterday's data?
+      rootDataDependencies: ['portfolio', 'insights'],  // What root data do I need?
+      userType: 'all'  // 'all', 'speculator', or 'normal'
+    };
+  }
+  static getDependencies() {
+    return ['risk-metrics', 'sentiment-score'];  // What other calculations do I depend on?
+  }
+}
+```
+The `ContextBuilder` then:
+1. **Checks `rootDataDependencies`** → Loads portfolio, insights, social, history, or price data
+2. **Checks `isHistorical`** → If true, loads yesterday's portfolio and previous computation results
+3. **Checks `getDependencies()`** → Fetches results from other calculations
+4. **Injects math layers** → Automatically includes all extractors, primitives, and utilities
+5. **Adds mappings** → Provides ticker ↔ instrument ID conversion
+**You only get what you ask for.** This keeps memory usage efficient and prevents unnecessary data loading.
+---
+## Data Loading & Routing
+### The Data Loading Pipeline
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     DataLoader (Cached)                      │
+├─────────────────────────────────────────────────────────────┤
+│  • loadMappings()         → Ticker/Instrument maps          │
+│  • loadInsights(date)     → Daily instrument insights       │
+│  • loadSocial(date)       → Social post insights            │
+│  • loadPriceShard(ref)    → Asset price data                │
+│  • getPriceShardRefs()    → All price shards                │
+│  • getSpecificPriceShardReferences(ids) → Targeted shards   │
+└─────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────┐
+│                  ComputationExecutor                         │
+├─────────────────────────────────────────────────────────────┤
+│  • executePerUser()       → Streams portfolio data          │
+│  • executeOncePerDay()    → Loads global/meta data          │
+└─────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────┐
+│                    ContextBuilder                            │
+├─────────────────────────────────────────────────────────────┤
+│  Assembles context based on metadata & dependencies         │
+└─────────────────────────────────────────────────────────────┘
+                              ↓
+                    Your Calculation.process()
+```
+### Streaming vs Batch Loading
+#### **Standard Computations: Streaming**
+Standard (per-user) computations use **streaming** to process users in chunks:
+```javascript
+// System streams portfolio data in batches of 50 users
+for await (const userBatch of streamPortfolioData()) {
+  // Each batch is processed in parallel
+  for (const [userId, portfolio] of Object.entries(userBatch)) {
+    const context = buildPerUserContext({ userId, portfolio, ... });
+    await calculation.process(context);
+  }
+}
+```
+**Why streaming?**
+- Portfolio data is sharded across multiple documents
+- Loading all users at once would exceed memory limits
+- Streaming allows processing millions of users efficiently
+#### **Meta Computations: Batch or Shard**
+Meta computations have two modes:
+1. **Standard Meta** (No price dependency):
+   ```javascript
+   const context = buildMetaContext({ insights, social, ... });
+   await calculation.process(context);
+   ```
+2. **Price-Dependent Meta** (Batched Shard Processing):
+   ```javascript
+   // System loads price data in shard batches
+   for (const shardRef of priceShardRefs) {
+     const shardData = await loadPriceShard(shardRef);
+     const context = buildMetaContext({ prices: { history: shardData } });
+     await calculation.process(context);
+     // Memory is cleared between shards
+   }
+   ```
+---
+## Sharding System
+### The Problem: Firestore's 1MB Limit
+Firestore has a **1MB hard limit** per document. When computation results contain thousands of tickers (e.g., momentum scores for every asset), the document exceeds this limit.
+### The Solution: Auto-Sharding
+The system **automatically detects** when a result is too large and splits it into a subcollection.
+### How Auto-Sharding Works
+```javascript
+// When saving results:
+const result = {
+  "AAPL": { score: 0.8, volatility: 0.25 },
+  "GOOGL": { score: 0.7, volatility: 0.22 },
+  // ... 5,000+ tickers
+};
+// System calculates size:
+const totalSize = calculateFirestoreBytes(result);  // ~1.2 MB
+// IF size > 900KB (safety threshold):
+// 1. Splits data into chunks < 900KB each
+// 2. Writes chunks to: /results/{date}/{category}/{calc}/_shards/shard_0
+//                      /results/{date}/{category}/{calc}/_shards/shard_1
+// 3. Writes pointer: /results/{date}/{category}/{calc}
+//    → { _sharded: true, _shardCount: 2, _completed: true }
+// IF size < 900KB:
+// Writes normally: /results/{date}/{category}/{calc}
+//    → { "AAPL": {...}, "GOOGL": {...}, _completed: true, _sharded: false }
+```
+### Reading Sharded Data
+The system **transparently reassembles** sharded data when loading dependencies:
+```javascript
+// When loading a dependency:
+const result = await fetchExistingResults(dateStr, ['momentum-score']);
+// System checks: Is this document sharded?
+if (doc.data()._sharded === true) {
+  // 1. Fetch all docs from _shards subcollection
+  // 2. Merge them back into a single object
+  // 3. Return as if it was never sharded
+}
+// Your calculation receives complete data, regardless of storage method
+```
+### Handling Mixed Storage Scenarios
+**Question:** What if I need data from 2 days, where Day 1 is sharded and Day 2 is not?
+**Answer:** The system handles this automatically:
+```javascript
+// Your calculation declares:
+static getMetadata() {
+  return {
+    isHistorical: true,  // I need yesterday's data
+    // ...
+  };
+}
+// System loads BOTH days:
+const computed = await fetchExistingResults(todayDate, ['momentum-score']);
+// → Auto-detects if sharded, reassembles if needed
+const previousComputed = await fetchExistingResults(yesterdayDate, ['momentum-score']);
+// → Auto-detects if sharded, reassembles if needed
+// Context now has both:
+{
+  computed: { "momentum-score": { /* today's data, reassembled if sharded */ } },
+  previousComputed: { "momentum-score": { /* yesterday's data, reassembled if sharded */ } }
+}
+```
+You never need to know or care whether data is sharded. The system guarantees you receive complete, reassembled data.
+---
+## Computation Types & Execution
+### Standard Computations (`type: 'standard'`)
+**Purpose:** Per-user calculations (risk profiles, P&L analysis, behavioral scoring)
+**Execution:**
+- Runs **once per user** per day
+- Receives individual user portfolio and history
+- Streams data in batches for memory efficiency
+**Example:**
+```javascript
+class UserRiskProfile {
+  static getMetadata() {
+    return {
+      type: 'standard',
+      rootDataDependencies: ['portfolio', 'history'],
+      userType: 'speculator'
+    };
+  }
+  async process(context) {
+    const { user, math } = context;
+    const portfolio = user.portfolio.today;
+    const positions = math.extract.getPositions(portfolio, user.type);
+    // Calculate risk per user
+    this.results[user.id] = { riskScore: /* ... */ };
+  }
+}
+```
+**Result Structure:**
+```javascript
+{
+  "user_123": { riskScore: 0.75 },
+  "user_456": { riskScore: 0.45 },
+  // ... millions of users
+}
+```
+### Meta Computations (`type: 'meta'`)
+**Purpose:** Platform-wide calculations (aggregate metrics, market analysis, global trends)
+**Execution:**
+- Runs **once per day** (not per user)
+- Processes all data holistically
+- Can access price history for all instruments
+**Example:**
+```javascript
+class MarketMomentum {
+  static getMetadata() {
+    return {
+      type: 'meta',
+      rootDataDependencies: ['price', 'insights']
+    };
+  }
+  async process(context) {
+    const { prices, insights, math } = context;
+    // Calculate momentum for every ticker
+    for (const [instId, data] of Object.entries(prices.history)) {
+      const ticker = data.ticker;
+      const priceData = math.priceExtractor.getHistory(prices, ticker);
+      this.results[ticker] = { momentum: /* ... */ };
+    }
+  }
+}
+```
+**Result Structure:**
+```javascript
+{
+  "AAPL": { momentum: 0.65 },
+  "GOOGL": { momentum: 0.82 },
+  // ... all tickers
+}
+```
+### Price-Dependent Meta Computations
+When a meta computation declares `rootDataDependencies: ['price']`, it enters **batched shard processing mode**:
+```javascript
+// Instead of loading ALL price data at once (would crash):
+for (const shardRef of priceShardRefs) {
+  const shardData = await loadPriceShard(shardRef);  // ~50-100 instruments per shard
+  const context = buildMetaContext({
+    prices: { history: shardData }  // Only this shard's data
+  });
+  await calculation.process(context);
+  // Results accumulate across shards
+  // Memory is cleared between iterations
+}
+```
+**Your calculation receives partial data** and processes it incrementally. The system ensures all shards are eventually processed.
+---
+## Dependency Management
+### Declaring Dependencies
+```javascript
+static getDependencies() {
+  return ['risk-metrics', 'sentiment-score', 'momentum-analysis'];
+}
+```
+This tells the system: "Before you run me, make sure these 3 calculations have completed."
+### How Dependencies Are Loaded
+When your calculation runs:
+1. System fetches results from all declared dependencies
+2. Checks if data is sharded → reassembles if needed
+3. Injects into `context.computed`:
+```javascript
+{
+  computed: {
+    "risk-metrics": { "AAPL": { volatility: 0.25 }, ... },
+    "sentiment-score": { "AAPL": { score: 0.8 }, ... },
+    "momentum-analysis": { "AAPL": { momentum: 0.65 }, ... }
+  }
+}
+```
+### Accessing Dependency Results
+```javascript
+async process(context) {
+  const { computed, math } = context;
+  // Access results from dependencies
+  const volatility = math.signals.getMetric(
+    computed,
+    'risk-metrics',
+    'AAPL',
+    'volatility'
+  );
+  const sentiment = math.signals.getMetric(
+    computed,
+    'sentiment-score',
+    'AAPL',
+    'score'
+  );
+  // Use them in your calculation
+  const combinedScore = volatility * sentiment;
+}
+```
+### Historical Dependencies
+If your calculation needs **yesterday's dependency results**:
+```javascript
+static getMetadata() {
+  return {
+    isHistorical: true,  // ← Enable historical mode
+    // ...
+  };
+}
+async process(context) {
+  const { computed, previousComputed } = context;
+  // Today's risk
+  const todayRisk = computed['risk-metrics']['AAPL'].volatility;
+  // Yesterday's risk
+  const yesterdayRisk = previousComputed['risk-metrics']['AAPL'].volatility;
+  // Calculate change
+  const riskChange = todayRisk - yesterdayRisk;
+}
+```
+---
+## Versioning & Smart Hashing
+### The Problem: When to Recompute?
+If you fix a bug in a calculation, how does the system know to re-run it for all past dates?
+### The Solution: Merkle Tree Dependency Hashing
+Every calculation gets a **smart hash** that includes:
+1. **Its own code** (SHA-256 of the class definition)
+2. **Layer dependencies** (Hashes of math layers it uses)
+3. **Calculation dependencies** (Hashes of calculations it depends on)
+```javascript
+// Example hash composition:
+const intrinsicHash = hash(calculation.toString() + layerHashes);
+const dependencyHashes = dependencies.map(dep => dep.hash).join('|');
+const finalHash = hash(intrinsicHash + '|DEPS:' + dependencyHashes);
+// Result: "a3f9c2e1..." (SHA-256)
+```
+### Cascading Invalidation
+If **Calculation A** changes, **Calculation B** (which depends on A) automatically gets a new hash:
+```
+Risk Metrics (v1) → hash: abc123
+                      ↓
+Sentiment Score     → hash: def456 (includes abc123)
+(depends on Risk)
+```
+If you update Risk Metrics:
+```
+Risk Metrics (v2) → hash: xyz789 (NEW!)
+                      ↓
+Sentiment Score     → hash: ghi012 (NEW! Because dependency changed)
+```
+### Recomputation Logic
+For each date, the system checks:
+```javascript
+// Stored in Firestore:
+computationStatus['2024-12-07'] = {
+  'risk-metrics': 'abc123',      // Last run hash
+  'sentiment-score': 'def456'
+};
+// Current manifest:
+manifest['risk-metrics'].hash = 'xyz789';  // NEW HASH!
+manifest['sentiment-score'].hash = 'ghi012';
+// Decision:
+// - Risk Metrics: Hash mismatch → RERUN
+// - Sentiment Score: Hash mismatch → RERUN (cascaded)
+```
+This ensures **incremental recomputation**: only changed calculations (and their dependents) re-run.
+---
+## Execution Modes
+### Mode 1: Legacy (Orchestrator)
+**Single-process execution** for all dates and calculations.
+```bash
+COMPUTATION_PASS_TO_RUN=1 npm run computation-orchestrator
+```
+- Loads manifest
+- Iterates through all dates
+- Runs all calculations in Pass 1 sequentially
+- Good for: Development, debugging
+### Mode 2: Dispatcher + Workers (Production)
+**Distributed execution** using Pub/Sub.
+#### Step 1: Dispatch Tasks
+```bash
+COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher
+```
+Publishes messages to Pub/Sub:
+```json
+{
+  "action": "RUN_COMPUTATION_DATE",
+  "date": "2024-12-07",
+  "pass": "1"
+}
+```
+#### Step 2: Workers Consume Tasks
+```bash
+# Cloud Function triggered by Pub/Sub
+# Or: Local consumer for testing
+npm run computation-worker
+```
+Each worker:
+1. Receives a date + pass
+2. Loads manifest
+3. Runs calculations for that date only
+4. Updates status document
+**Benefits:**
+- Parallel execution (100+ workers)
+- Fault tolerance (failed dates retry automatically)
+- Scales to millions of dates
+### Pass System
+Calculations are grouped into **passes** based on dependencies:
+```
+Pass 1: Base calculations (no dependencies)
+  - risk-metrics
+  - price-momentum
+Pass 2: Depends on Pass 1
+  - sentiment-score (needs risk-metrics)
+  - trend-analysis (needs price-momentum)
+Pass 3: Depends on Pass 2
+  - combined-signal (needs sentiment-score + trend-analysis)
+```
+**You run passes sequentially:**
+```bash
+COMPUTATION_PASS_TO_RUN=1 npm run computation-dispatcher  # Wait for completion
+COMPUTATION_PASS_TO_RUN=2 npm run computation-dispatcher  # Wait for completion
+COMPUTATION_PASS_TO_RUN=3 npm run computation-dispatcher
+```
+The manifest builder automatically assigns pass numbers via topological sort.
+---
+## Summary: The Complete Flow
+### For a Standard Calculation
+```
+1. Manifest Builder
+   ├─ Scans your calculation class
+   ├─ Generates smart hash (code + layers + dependencies)
+   ├─ Assigns to a pass based on dependency graph
+   └─ Validates all dependencies exist
+2. Dispatcher/Orchestrator
+   ├─ Loads manifest
+   ├─ Iterates through all dates
+   └─ For each date:
+       ├─ Checks if calculation needs to run (hash mismatch?)
+       ├─ Checks if root data exists (portfolio, history, etc.)
+       └─ Dispatches task (or runs directly)
+3. Worker/Executor
+   ├─ Receives task for specific date
+   ├─ Loads dependency results (auto-reassembles if sharded)
+   ├─ Streams portfolio data in batches
+   └─ For each user batch:
+       ├─ Builds per-user context
+       ├─ Injects math layers, mappings, computed dependencies
+       ├─ Calls your calculation.process(context)
+       └─ Accumulates results
+4. Result Committer
+   ├─ Calculates total result size
+   ├─ IF size > 900KB:
+   │   ├─ Splits into chunks
+   │   ├─ Writes to _shards subcollection
+   │   └─ Writes pointer document
+   └─ ELSE:
+       └─ Writes single document
+5. Status Updater
+   └─ Updates computation_status/{date} with new hash
+```
+### For a Meta Calculation
+Same as above, except:
+- **Step 3**: Loads all data once (or iterates through price shards)
+- **Context**: Global data, not per-user
+- **Result**: One document per date (e.g., all tickers' momentum scores)
+---
+## Key Takeaways
+1. **Context is Auto-Built**: Declare what you need in metadata; the system handles the rest
+2. **Sharding is Transparent**: Read and write as if documents have no size limit
+3. **Dependencies Just Work**: Results are automatically fetched and reassembled
+4. **Versioning is Smart**: Change code → system knows what to rerun
+5. **Streaming is Automatic**: Standard computations stream data; you don't manage batches
+6. **Execution is Flexible**: Run locally for dev, distributed for production
+---