bulltrackers-module 1.0.762 → 1.0.764

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,103 +0,0 @@
1
- # System Architecture
2
-
3
- ## 1. System Philosophy
4
-
5
- The Computation System v2 is designed to be a generic, configuration-driven Directed Acyclic Graph (DAG) executor that operates directly on BigQuery data. It departs from traditional ETL pipelines by adhering to five core principles:
6
-
7
- 1. **Zero Hardcoded Schemas**: The system never defines schemas in code. It dynamically discovers them from BigQuery's `INFORMATION_SCHEMA` at runtime. This eliminates the "schema drift" problem where code falls out of sync with the database.
8
- 2. **Pre-Query Validation**: Every SQL query is validated against the cached schema *before* it is sent to BigQuery. This prevents expensive runtime failures and SQL errors.
9
- 3. **Pass-Based Execution**: Computations are automatically organized into "passes" (waves) based on their dependencies using Kahn's algorithm.
10
- 4. **Hash-Based Versioning**: The system tracks the "Identity" of every computation result using a cryptographic hash of:
11
- * The Computation Code
12
- * Shared Utility "Layers"
13
- * Business Logic "Rules"
14
- * Dependency Result Hashes
15
- 5. **Hybrid Execution**: Light computations run globally in-memory, while heavy per-entity computations are offloaded to a serverless worker pool (Remote Task Runner).
16
-
17
- ---
18
-
19
- ## 2. Core Components
20
-
21
- ### 2.1. The Manifest Builder (`framework/core/Manifest.js`)
22
- The Manifest is the blueprint of the system. At startup, it:
23
- 1. **Loads Configs**: Reads static `getConfig()` from all Computation classes.
24
- 2. **Builds Graph**: Constructs the dependency graph.
25
- 3. **Detects Cycles**: Throws an error if A -> B -> A.
26
- 4. **Calculates Passes**: Uses Topological Sort to assign a `pass` number (0, 1, 2...) to each computation.
27
- 5. **Generates Hashes**: Computes the intrinsic version hash for change detection.
28
-
29
- ### 2.2. Schema Registry (`framework/data/SchemaRegistry.js`)
30
- The bridge between code and data.
31
- * **Dynamic Discovery**: Fetches column definitions, types, and nullability from BigQuery.
32
- * **Caching**: Caches schemas in memory (default TTL: 1 hour) to reduce latency.
33
- * **Request Coalescing**: Prevents "Thundering Herd" issues by merging simultaneous requests for the same table schema.
34
-
35
- ### 2.3. The Orchestrator (`framework/execution/Orchestrator.js`)
36
- The central nervous system. It creates the execution plan for a given date.
37
- * **Dependency Resolution**: Ensures pre-requisite data is loaded.
38
- * **Execution Strategy**: Decides whether to run a computation locally or offload it to the Remote Task Runner.
39
- * **Streaming**: For large datasets, it streams data in batches (default: 1000 entities) to avoid Out-Of-Memory (OOM) crashes.
40
-
41
- ---
42
-
43
- ## 3. Data Flow
44
-
45
- ```mermaid
46
- graph TD
47
- User[User/Scheduler] -->|Trigger| Dispatcher
48
- Dispatcher -->|POST| Orchestrator
49
-
50
- subgraph "Phase 1: Preparation"
51
- Orchestrator -->|Build| Manifest
52
- Orchestrator -->|Check Status| StateDB[(State Repository)]
53
- StateDB -->|Run/Skip/ReRun?| Orchestrator
54
- end
55
-
56
- subgraph "Phase 2: Execution"
57
- Orchestrator -->|Fetch Data| DataFetcher
58
- DataFetcher -->|Validate| SchemaRegistry
59
- SchemaRegistry -->|Meta| BigQuery[(BigQuery)]
60
- DataFetcher -->|Query| BigQuery
61
-
62
- Orchestrator -->|Pass 0| C1[Computation A]
63
- Orchestrator -->|Pass 1| C2[Computation B]
64
-
65
- C1 -->|Result| Storage[Storage Manager]
66
- Storage -->|Persist| BigQuery
67
- end
68
- ```
69
-
70
- ---
71
-
72
- ## 4. Execution Modes
73
-
74
- ### 4.1. Global Mode (Local Isolation)
75
- Used for: Aggregations, summaries, or light computations.
76
- * **Flow**: Fetches *all* required data into memory -> Runs logic -> Writes single result.
77
- * **Concurrency**: Parallelized by `p-limit` locally.
78
-
79
- ### 4.2. Per-Entity Mode (Remote Offload)
80
- Used for: Complex logic (e.g., Portfolio Calculations) requiring isolation.
81
- * **Orchestrator**: Fetches data for a batch (e.g., 100 users).
82
- * **Storage**: Uploads context (Data + Rules + Config) to Cloud Storage (GCS).
83
- * **Worker Pool**: Invokes stateless Cloud Functions to process the batch.
84
- * **Circuit Breaker**: Stops execution if failure rate exceeds threshold (default: 30%).
85
-
86
- ---
87
-
88
- ## 5. Versioning & Hashing
89
-
90
- How does the system know when to re-run a computation?
91
-
92
- The **Composite Hash** is calculated as:
93
- ```
94
- Hash = SHA256(
95
- Code Body +
96
- Epoch Version +
97
- Shared Layer Hashes +
98
- Rule Module Hashes +
99
- Dependency Result Hashes
100
- )
101
- ```
102
-
103
- If this hash matches the `resultHash` stored in the State Repository for a given date, the computation is **SKIPPED** (unless forced). This ensures idempotency and avoids unnecessary processing costs.
@@ -1,125 +0,0 @@
1
- # Developer Guide
2
-
3
- This guide explains how to create, configure, and test new Computations in the v2 framework.
4
-
5
- ## 1. Creating a Computation
6
-
7
- All computations must extend the base `Computation` class.
8
-
9
- ### Basic Template
10
-
11
- ```javascript
12
- const { Computation } = require('../../framework/core/Computation');
13
-
14
- class UserDailyActive extends Computation {
15
- static getConfig() {
16
- return {
17
- name: 'UserDailyActive',
18
- type: 'per-entity', // or 'global'
19
- requires: {
20
- // Table Dependencies (Data)
21
- 'user_logins': {
22
- lookback: 0, // 0 = today only
23
- mandatory: true
24
- }
25
- },
26
- dependencies: [
27
- // Computation Dependencies (Prerequisites)
28
- // 'UserSessionSummary'
29
- ]
30
- };
31
- }
32
-
33
- async process(context) {
34
- const { date, entityId, data } = context;
35
-
36
- // Access pre-fetched data
37
- const logins = data['user_logins'];
38
-
39
- if (!logins || logins.length === 0) {
40
- return null; // Return null to skip saving result
41
- }
42
-
43
- return {
44
- date,
45
- userId: entityId,
46
- loginCount: logins.length,
47
- lastLoginTime: logins[logins.length - 1].timestamp
48
- };
49
- }
50
- }
51
- module.exports = UserDailyActive;
52
- ```
53
-
54
- ---
55
-
56
- ## 2. Configuration Options (`getConfig`)
57
-
58
- | Field | Type | Description |
59
- | :--- | :--- | :--- |
60
- | `name` | String | **Required**. Unique name of the computation. |
61
- | `type` | String | `'global'` (runs once) or `'per-entity'` (runs for each entity). |
62
- | `requires` | Object | Map of table names to requirements. |
63
- | `dependencies` | Array | List of other computation names that must run *before* this one. |
64
- | `schedule` | Object | `{ frequency: 'daily', time: '02:00' }`. Defaults to daily/02:00. |
65
- | `ttlDays` | Number | How long to keep results in State DB (default: 365). |
66
- | `isHistorical`| Boolean| If `true`, `context.previousResult` will contain yesterday's output. |
67
-
68
- ### The `requires` Object
69
-
70
- ```javascript
71
- requires: {
72
- 'orders': {
73
- lookback: 7, // Fetch last 7 days of data
74
- mandatory: false, // If true, crashes if table is empty/missing
75
- dateField: 'created_at' // Optional override
76
- }
77
- }
78
- ```
79
-
80
- ---
81
-
82
- ## 3. The `process(context)` Method
83
-
84
- The `process` method is the heart of your logic. It receives a `context` object:
85
-
86
- ### `context.data`
87
- Contains the raw data fetched from BigQuery.
88
- * **Per-Entity**: `data['table_name']` is the row (or array of rows) *specifically for that entity*.
89
- * **Global**: `data['table_name']` is the entire dataset fetched.
90
-
91
- ### `context.getDependency(name, [entityId])`
92
- Access results from previous computations.
93
- ```javascript
94
- // In a per-entity computation, getting its own dependency:
95
- const sessionStats = context.getDependency('UserSessionSummary');
96
-
97
- // Getting a global dependency result from a per-entity computation:
98
- const globalSettings = context.getDependency('GlobalSettings', '_global');
99
- ```
100
-
101
- ### `context.rules`
102
- Access to shared business logic modules (if configured).
103
-
104
- ---
105
-
106
- ## 4. Testing & Validation
107
-
108
- ### Local Dry Run
109
- You can run your computation locally without writing to BigQuery/Firestore by using the dry-run flag.
110
-
111
- ```bash
112
- # Run a specific computation for a specific date
113
- node index.js execute \
114
- --computation UserDailyActive \
115
- --date 2026-01-26 \
116
- --dry-run \
117
- --entityId user_12345
118
- ```
119
-
120
- ### Logging
121
- Use `this.log(level, message)` instead of `console.log`. This ensures logs are properly tagged with the computation name and entity ID in Cloud Logging.
122
-
123
- ```javascript
124
- this.log('INFO', `Processing user ${entityId} with ${logins.length} logins`);
125
- ```
@@ -1,99 +0,0 @@
1
- # Operations Manual
2
-
3
- This guide covers deployment, daily operations, monitoring, and troubleshooting.
4
-
5
- ## 1. Deployment
6
-
7
- The system is deployed as a Google Cloud Function (Gen 2).
8
-
9
- ### Deploy Command
10
- ```bash
11
- # Navigate to the function directory
12
- cd functions/computation-system-v2
13
-
14
- # Deploy using the included script
15
- node deploy.mjs ComputeSystem
16
- ```
17
-
18
- **Note**: Ensure you have the `gcloud` CLI installed and authenticated.
19
-
20
- ---
21
-
22
- ## 2. Management API
23
-
24
- The system exposes an HTTP endpoint for manual administration.
25
-
26
- ### Setup (One-time)
27
- ```bash
28
- # Get Function URL
29
- FUNCTION_URL=$(gcloud functions describe compute-system --region=europe-west1 --format="value(serviceConfig.uri)")
30
-
31
- # Get Auth Token
32
- TOKEN=$(gcloud auth print-identity-token)
33
- ```
34
-
35
- ### Common Actions
36
-
37
- #### Check Status
38
- Lists all registered computations and their hash status.
39
- ```bash
40
- curl -X POST "$FUNCTION_URL" \
41
- -H "Authorization: Bearer $TOKEN" \
42
- -H "Content-Type: application/json" \
43
- -d '{"action": "status"}'
44
- ```
45
-
46
- #### Analyze a Date
47
- Simulates a run for a valid date, showing what would run, skip, or be blocked.
48
- ```bash
49
- curl -X POST "$FUNCTION_URL" \
50
- -H "Authorization: Bearer $TOKEN" \
51
- -H "Content-Type: application/json" \
52
- -d '{"action": "analyze", "date": "2026-01-26"}'
53
- ```
54
-
55
- #### Manually Trigger (Force Run)
56
- Forces a computation to run, ignoring hash checks.
57
- ```bash
58
- curl -X POST "$FUNCTION_URL" \
59
- -H "Authorization: Bearer $TOKEN" \
60
- -H "Content-Type: application/json" \
61
- -d '{"action": "run", "computation": "UserDailyActive", "date": "2026-01-26", "force": true}'
62
- ```
63
-
64
- ---
65
-
66
- ## 3. Monitoring
67
-
68
- The Orchestrator logs a structured **Execution Summary** at the end of every run.
69
-
70
- ### Understanding Statuses
71
-
72
- | Status | Icon | Description | Action Required |
73
- | :--- | :--- | :--- | :--- |
74
- | `completed` | ✅ | Computation finished successfully. | None |
75
- | `skipped` | ⏭️ | Result hash matches previous run (Up-to-date). | None |
76
- | `blocked` | ⛔ | A dependency failed or hasn't run yet. | Fix dependency then re-run. |
77
- | `impossible` | 🚫 | A mandatory requirement (table) is empty. | Check upstream ETL. |
78
- | `failed` | ❌ | Runtime error threw an exception. | Check logs for stack trace. |
79
-
80
- ### Circuit Breaker (Remote Mode)
81
- If you see `CIRCUIT BREAKER TRIPPED` in the logs:
82
- 1. **Stop**: The system automatically aborted to prevent wasting money on failing workers.
83
- 2. **Investigate**: Look for "Worker failed" logs to see the root cause (e.g., Syntax Error in calculation).
84
- 3. **Fix**: Deploy the fix.
85
- 4. **Retry**: The circuit resets automatically on the next run.
86
-
87
- ---
88
-
89
- ## 4. Troubleshooting
90
-
91
- ### "Thundering Herd" on Startup
92
- **Symptom**: Massive latency on the first few requests.
93
- **Cause**: Cold start + filling Schema Cache.
94
- **Fix**: The system now uses **Request Coalescing**. If it persists, increase `concurrency` in `workerPool` config.
95
-
96
- ### "Zombie" Computations
97
- **Symptom**: A computation stuck in "Running" state for hours.
98
- **Cause**: Function timeout or crash before writing status.
99
- **Fix**: Run the `status` command. The system automatically detects stale locks (> 1 hour) and allows re-execution.