bulltrackers-module 1.0.763 → 1.0.765
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/functions/computation-system-v2/computations/BehavioralAnomaly.js +281 -0
- package/functions/computation-system-v2/computations/NewSectorExposure.js +135 -0
- package/functions/computation-system-v2/computations/NewSocialPost.js +99 -0
- package/functions/computation-system-v2/computations/PositionInvestedIncrease.js +148 -0
- package/functions/computation-system-v2/computations/RiskScoreIncrease.js +140 -0
- package/functions/computation-system-v2/config/bulltrackers.config.js +1 -1
- package/functions/computation-system-v2/docs/Agents.MD +964 -0
- package/functions/computation-system-v2/framework/execution/RemoteTaskRunner.js +115 -135
- package/functions/computation-system-v2/handlers/scheduler.js +8 -2
- package/functions/computation-system-v2/handlers/worker.js +110 -181
- package/package.json +1 -1
- package/functions/computation-system-v2/docs/admin.md +0 -117
- package/functions/computation-system-v2/docs/api_reference.md +0 -118
- package/functions/computation-system-v2/docs/architecture.md +0 -103
- package/functions/computation-system-v2/docs/developer_guide.md +0 -125
- package/functions/computation-system-v2/docs/operations.md +0 -99
- package/functions/computation-system-v2/docs/plans.md +0 -588
|
@@ -1,103 +0,0 @@
|
|
|
1
|
-
# System Architecture
|
|
2
|
-
|
|
3
|
-
## 1. System Philosophy
|
|
4
|
-
|
|
5
|
-
The Computation System v2 is designed to be a generic, configuration-driven Directed Acyclic Graph (DAG) executor that operates directly on BigQuery data. It departs from traditional ETL pipelines by adhering to five core principles:
|
|
6
|
-
|
|
7
|
-
1. **Zero Hardcoded Schemas**: The system never defines schemas in code. It dynamically discovers them from BigQuery's `INFORMATION_SCHEMA` at runtime. This eliminates the "schema drift" problem where code falls out of sync with the database.
|
|
8
|
-
2. **Pre-Query Validation**: Every SQL query is validated against the cached schema *before* it is sent to BigQuery. This prevents expensive runtime failures and SQL errors.
|
|
9
|
-
3. **Pass-Based Execution**: Computations are automatically organized into "passes" (waves) based on their dependencies using Kahn's algorithm.
|
|
10
|
-
4. **Hash-Based Versioning**: The system tracks the "Identity" of every computation result using a cryptographic hash of:
|
|
11
|
-
* The Computation Code
|
|
12
|
-
* Shared Utility "Layers"
|
|
13
|
-
* Business Logic "Rules"
|
|
14
|
-
* Dependency Result Hashes
|
|
15
|
-
5. **Hybrid Execution**: Light computations run globally in-memory, while heavy per-entity computations are offloaded to a serverless worker pool (Remote Task Runner).
|
|
16
|
-
|
|
17
|
-
---
|
|
18
|
-
|
|
19
|
-
## 2. Core Components
|
|
20
|
-
|
|
21
|
-
### 2.1. The Manifest Builder (`framework/core/Manifest.js`)
|
|
22
|
-
The Manifest is the blueprint of the system. At startup, it:
|
|
23
|
-
1. **Loads Configs**: Reads static `getConfig()` from all Computation classes.
|
|
24
|
-
2. **Builds Graph**: Constructs the dependency graph.
|
|
25
|
-
3. **Detects Cycles**: Throws an error if A -> B -> A.
|
|
26
|
-
4. **Calculates Passes**: Uses Topological Sort to assign a `pass` number (0, 1, 2...) to each computation.
|
|
27
|
-
5. **Generates Hashes**: Computes the intrinsic version hash for change detection.
|
|
28
|
-
|
|
29
|
-
### 2.2. Schema Registry (`framework/data/SchemaRegistry.js`)
|
|
30
|
-
The bridge between code and data.
|
|
31
|
-
* **Dynamic Discovery**: Fetches column definitions, types, and nullability from BigQuery.
|
|
32
|
-
* **Caching**: Caches schemas in memory (default TTL: 1 hour) to reduce latency.
|
|
33
|
-
* **Request Coalescing**: Prevents "Thundering Herd" issues by merging simultaneous requests for the same table schema.
|
|
34
|
-
|
|
35
|
-
### 2.3. The Orchestrator (`framework/execution/Orchestrator.js`)
|
|
36
|
-
The central nervous system. It creates the execution plan for a given date.
|
|
37
|
-
* **Dependency Resolution**: Ensures pre-requisite data is loaded.
|
|
38
|
-
* **Execution Strategy**: Decides whether to run a computation locally or offload it to the Remote Task Runner.
|
|
39
|
-
* **Streaming**: For large datasets, it streams data in batches (default: 1000 entities) to avoid Out-Of-Memory (OOM) crashes.
|
|
40
|
-
|
|
41
|
-
---
|
|
42
|
-
|
|
43
|
-
## 3. Data Flow
|
|
44
|
-
|
|
45
|
-
```mermaid
|
|
46
|
-
graph TD
|
|
47
|
-
User[User/Scheduler] -->|Trigger| Dispatcher
|
|
48
|
-
Dispatcher -->|POST| Orchestrator
|
|
49
|
-
|
|
50
|
-
subgraph "Phase 1: Preparation"
|
|
51
|
-
Orchestrator -->|Build| Manifest
|
|
52
|
-
Orchestrator -->|Check Status| StateDB[(State Repository)]
|
|
53
|
-
StateDB -->|Run/Skip/ReRun?| Orchestrator
|
|
54
|
-
end
|
|
55
|
-
|
|
56
|
-
subgraph "Phase 2: Execution"
|
|
57
|
-
Orchestrator -->|Fetch Data| DataFetcher
|
|
58
|
-
DataFetcher -->|Validate| SchemaRegistry
|
|
59
|
-
SchemaRegistry -->|Meta| BigQuery[(BigQuery)]
|
|
60
|
-
DataFetcher -->|Query| BigQuery
|
|
61
|
-
|
|
62
|
-
Orchestrator -->|Pass 0| C1[Computation A]
|
|
63
|
-
Orchestrator -->|Pass 1| C2[Computation B]
|
|
64
|
-
|
|
65
|
-
C1 -->|Result| Storage[Storage Manager]
|
|
66
|
-
Storage -->|Persist| BigQuery
|
|
67
|
-
end
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
---
|
|
71
|
-
|
|
72
|
-
## 4. Execution Modes
|
|
73
|
-
|
|
74
|
-
### 4.1. Global Mode (Local Isolation)
|
|
75
|
-
Used for: Aggregations, summaries, or light computations.
|
|
76
|
-
* **Flow**: Fetches *all* required data into memory -> Runs logic -> Writes single result.
|
|
77
|
-
* **Concurrency**: Parallelized by `p-limit` locally.
|
|
78
|
-
|
|
79
|
-
### 4.2. Per-Entity Mode (Remote Offload)
|
|
80
|
-
Used for: Complex logic (e.g., Portfolio Calculations) requiring isolation.
|
|
81
|
-
* **Orchestrator**: Fetches data for a batch (e.g., 100 users).
|
|
82
|
-
* **Storage**: Uploads context (Data + Rules + Config) to Cloud Storage (GCS).
|
|
83
|
-
* **Worker Pool**: Invokes stateless Cloud Functions to process the batch.
|
|
84
|
-
* **Circuit Breaker**: Stops execution if failure rate exceeds threshold (default: 30%).
|
|
85
|
-
|
|
86
|
-
---
|
|
87
|
-
|
|
88
|
-
## 5. Versioning & Hashing
|
|
89
|
-
|
|
90
|
-
How does the system know when to re-run a computation?
|
|
91
|
-
|
|
92
|
-
The **Composite Hash** is calculated as:
|
|
93
|
-
```
|
|
94
|
-
Hash = SHA256(
|
|
95
|
-
Code Body +
|
|
96
|
-
Epoch Version +
|
|
97
|
-
Shared Layer Hashes +
|
|
98
|
-
Rule Module Hashes +
|
|
99
|
-
Dependency Result Hashes
|
|
100
|
-
)
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
If this hash matches the `resultHash` stored in the State Repository for a given date, the computation is **SKIPPED** (unless forced). This ensures idempotency and avoids unnecessary processing costs.
|
|
@@ -1,125 +0,0 @@
|
|
|
1
|
-
# Developer Guide
|
|
2
|
-
|
|
3
|
-
This guide explains how to create, configure, and test new Computations in the v2 framework.
|
|
4
|
-
|
|
5
|
-
## 1. Creating a Computation
|
|
6
|
-
|
|
7
|
-
All computations must extend the base `Computation` class.
|
|
8
|
-
|
|
9
|
-
### Basic Template
|
|
10
|
-
|
|
11
|
-
```javascript
|
|
12
|
-
const { Computation } = require('../../framework/core/Computation');
|
|
13
|
-
|
|
14
|
-
class UserDailyActive extends Computation {
|
|
15
|
-
static getConfig() {
|
|
16
|
-
return {
|
|
17
|
-
name: 'UserDailyActive',
|
|
18
|
-
type: 'per-entity', // or 'global'
|
|
19
|
-
requires: {
|
|
20
|
-
// Table Dependencies (Data)
|
|
21
|
-
'user_logins': {
|
|
22
|
-
lookback: 0, // 0 = today only
|
|
23
|
-
mandatory: true
|
|
24
|
-
}
|
|
25
|
-
},
|
|
26
|
-
dependencies: [
|
|
27
|
-
// Computation Dependencies (Prerequisites)
|
|
28
|
-
// 'UserSessionSummary'
|
|
29
|
-
]
|
|
30
|
-
};
|
|
31
|
-
}
|
|
32
|
-
|
|
33
|
-
async process(context) {
|
|
34
|
-
const { date, entityId, data } = context;
|
|
35
|
-
|
|
36
|
-
// Access pre-fetched data
|
|
37
|
-
const logins = data['user_logins'];
|
|
38
|
-
|
|
39
|
-
if (!logins || logins.length === 0) {
|
|
40
|
-
return null; // Return null to skip saving result
|
|
41
|
-
}
|
|
42
|
-
|
|
43
|
-
return {
|
|
44
|
-
date,
|
|
45
|
-
userId: entityId,
|
|
46
|
-
loginCount: logins.length,
|
|
47
|
-
lastLoginTime: logins[logins.length - 1].timestamp
|
|
48
|
-
};
|
|
49
|
-
}
|
|
50
|
-
}
|
|
51
|
-
module.exports = UserDailyActive;
|
|
52
|
-
```
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
## 2. Configuration Options (`getConfig`)
|
|
57
|
-
|
|
58
|
-
| Field | Type | Description |
|
|
59
|
-
| :--- | :--- | :--- |
|
|
60
|
-
| `name` | String | **Required**. Unique name of the computation. |
|
|
61
|
-
| `type` | String | `'global'` (runs once) or `'per-entity'` (runs for each entity). |
|
|
62
|
-
| `requires` | Object | Map of table names to requirements. |
|
|
63
|
-
| `dependencies` | Array | List of other computation names that must run *before* this one. |
|
|
64
|
-
| `schedule` | Object | `{ frequency: 'daily', time: '02:00' }`. Defaults to daily/02:00. |
|
|
65
|
-
| `ttlDays` | Number | How long to keep results in State DB (default: 365). |
|
|
66
|
-
| `isHistorical`| Boolean| If `true`, `context.previousResult` will contain yesterday's output. |
|
|
67
|
-
|
|
68
|
-
### The `requires` Object
|
|
69
|
-
|
|
70
|
-
```javascript
|
|
71
|
-
requires: {
|
|
72
|
-
'orders': {
|
|
73
|
-
lookback: 7, // Fetch last 7 days of data
|
|
74
|
-
mandatory: false, // If true, crashes if table is empty/missing
|
|
75
|
-
dateField: 'created_at' // Optional override
|
|
76
|
-
}
|
|
77
|
-
}
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
---
|
|
81
|
-
|
|
82
|
-
## 3. The `process(context)` Method
|
|
83
|
-
|
|
84
|
-
The `process` method is the heart of your logic. It receives a `context` object:
|
|
85
|
-
|
|
86
|
-
### `context.data`
|
|
87
|
-
Contains the raw data fetched from BigQuery.
|
|
88
|
-
* **Per-Entity**: `data['table_name']` is the row (or array of rows) *specifically for that entity*.
|
|
89
|
-
* **Global**: `data['table_name']` is the entire dataset fetched.
|
|
90
|
-
|
|
91
|
-
### `context.getDependency(name, [entityId])`
|
|
92
|
-
Access results from previous computations.
|
|
93
|
-
```javascript
|
|
94
|
-
// In a per-entity computation, getting its own dependency:
|
|
95
|
-
const sessionStats = context.getDependency('UserSessionSummary');
|
|
96
|
-
|
|
97
|
-
// Getting a global dependency result from a per-entity computation:
|
|
98
|
-
const globalSettings = context.getDependency('GlobalSettings', '_global');
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
### `context.rules`
|
|
102
|
-
Access to shared business logic modules (if configured).
|
|
103
|
-
|
|
104
|
-
---
|
|
105
|
-
|
|
106
|
-
## 4. Testing & Validation
|
|
107
|
-
|
|
108
|
-
### Local Dry Run
|
|
109
|
-
You can run your computation locally without writing to BigQuery/Firestore by using the dry-run flag.
|
|
110
|
-
|
|
111
|
-
```bash
|
|
112
|
-
# Run a specific computation for a specific date
|
|
113
|
-
node index.js execute \
|
|
114
|
-
--computation UserDailyActive \
|
|
115
|
-
--date 2026-01-26 \
|
|
116
|
-
--dry-run \
|
|
117
|
-
--entityId user_12345
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
### Logging
|
|
121
|
-
Use `this.log(level, message)` instead of `console.log`. This ensures logs are properly tagged with the computation name and entity ID in Cloud Logging.
|
|
122
|
-
|
|
123
|
-
```javascript
|
|
124
|
-
this.log('INFO', `Processing user ${entityId} with ${logins.length} logins`);
|
|
125
|
-
```
|
|
@@ -1,99 +0,0 @@
|
|
|
1
|
-
# Operations Manual
|
|
2
|
-
|
|
3
|
-
This guide covers deployment, daily operations, monitoring, and troubleshooting.
|
|
4
|
-
|
|
5
|
-
## 1. Deployment
|
|
6
|
-
|
|
7
|
-
The system is deployed as a Google Cloud Function (Gen 2).
|
|
8
|
-
|
|
9
|
-
### Deploy Command
|
|
10
|
-
```bash
|
|
11
|
-
# Navigate to the function directory
|
|
12
|
-
cd functions/computation-system-v2
|
|
13
|
-
|
|
14
|
-
# Deploy using the included script
|
|
15
|
-
node deploy.mjs ComputeSystem
|
|
16
|
-
```
|
|
17
|
-
|
|
18
|
-
**Note**: Ensure you have the `gcloud` CLI installed and authenticated.
|
|
19
|
-
|
|
20
|
-
---
|
|
21
|
-
|
|
22
|
-
## 2. Management API
|
|
23
|
-
|
|
24
|
-
The system exposes an HTTP endpoint for manual administration.
|
|
25
|
-
|
|
26
|
-
### Setup (One-time)
|
|
27
|
-
```bash
|
|
28
|
-
# Get Function URL
|
|
29
|
-
FUNCTION_URL=$(gcloud functions describe compute-system --region=europe-west1 --format="value(serviceConfig.uri)")
|
|
30
|
-
|
|
31
|
-
# Get Auth Token
|
|
32
|
-
TOKEN=$(gcloud auth print-identity-token)
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
### Common Actions
|
|
36
|
-
|
|
37
|
-
#### Check Status
|
|
38
|
-
Lists all registered computations and their hash status.
|
|
39
|
-
```bash
|
|
40
|
-
curl -X POST "$FUNCTION_URL" \
|
|
41
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
42
|
-
-H "Content-Type: application/json" \
|
|
43
|
-
-d '{"action": "status"}'
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
#### Analyze a Date
|
|
47
|
-
Simulates a run for a valid date, showing what would run, skip, or be blocked.
|
|
48
|
-
```bash
|
|
49
|
-
curl -X POST "$FUNCTION_URL" \
|
|
50
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
51
|
-
-H "Content-Type: application/json" \
|
|
52
|
-
-d '{"action": "analyze", "date": "2026-01-26"}'
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
#### Manually Trigger (Force Run)
|
|
56
|
-
Forces a computation to run, ignoring hash checks.
|
|
57
|
-
```bash
|
|
58
|
-
curl -X POST "$FUNCTION_URL" \
|
|
59
|
-
-H "Authorization: Bearer $TOKEN" \
|
|
60
|
-
-H "Content-Type: application/json" \
|
|
61
|
-
-d '{"action": "run", "computation": "UserDailyActive", "date": "2026-01-26", "force": true}'
|
|
62
|
-
```
|
|
63
|
-
|
|
64
|
-
---
|
|
65
|
-
|
|
66
|
-
## 3. Monitoring
|
|
67
|
-
|
|
68
|
-
The Orchestrator logs a structured **Execution Summary** at the end of every run.
|
|
69
|
-
|
|
70
|
-
### Understanding Statuses
|
|
71
|
-
|
|
72
|
-
| Status | Icon | Description | Action Required |
|
|
73
|
-
| :--- | :--- | :--- | :--- |
|
|
74
|
-
| `completed` | ✅ | Computation finished successfully. | None |
|
|
75
|
-
| `skipped` | ⏭️ | Result hash matches previous run (Up-to-date). | None |
|
|
76
|
-
| `blocked` | ⛔ | A dependency failed or hasn't run yet. | Fix dependency then re-run. |
|
|
77
|
-
| `impossible` | 🚫 | A mandatory requirement (table) is empty. | Check upstream ETL. |
|
|
78
|
-
| `failed` | ❌ | Runtime error threw an exception. | Check logs for stack trace. |
|
|
79
|
-
|
|
80
|
-
### Circuit Breaker (Remote Mode)
|
|
81
|
-
If you see `CIRCUIT BREAKER TRIPPED` in the logs:
|
|
82
|
-
1. **Stop**: The system automatically aborted to prevent wasting money on failing workers.
|
|
83
|
-
2. **Investigate**: Look for "Worker failed" logs to see the root cause (e.g., Syntax Error in calculation).
|
|
84
|
-
3. **Fix**: Deploy the fix.
|
|
85
|
-
4. **Retry**: The circuit resets automatically on the next run.
|
|
86
|
-
|
|
87
|
-
---
|
|
88
|
-
|
|
89
|
-
## 4. Troubleshooting
|
|
90
|
-
|
|
91
|
-
### "Thundering Herd" on Startup
|
|
92
|
-
**Symptom**: Massive latency on the first few requests.
|
|
93
|
-
**Cause**: Cold start + filling Schema Cache.
|
|
94
|
-
**Fix**: The system now uses **Request Coalescing**. If it persists, increase `concurrency` in `workerPool` config.
|
|
95
|
-
|
|
96
|
-
### "Zombie" Computations
|
|
97
|
-
**Symptom**: A computation stuck in "Running" state for hours.
|
|
98
|
-
**Cause**: Function timeout or crash before writing status.
|
|
99
|
-
**Fix**: Run the `status` command. The system automatically detects stale locks (> 1 hour) and allows re-execution.
|