@pgflow/core 0.0.0-array-map-steps-302d00a8-20250922101336 → 0.0.0-test-snapshot-releases-8d5d9bc1-20250922101013
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -7
- package/package.json +2 -2
- package/dist/ATLAS.md +0 -32
- package/dist/CHANGELOG.md +0 -645
- package/dist/PLAN_race_condition_testing.md +0 -176
- package/dist/PgflowSqlClient.d.ts +0 -17
- package/dist/PgflowSqlClient.d.ts.map +0 -1
- package/dist/PgflowSqlClient.js +0 -70
- package/dist/README.md +0 -399
- package/dist/database-types.d.ts +0 -832
- package/dist/database-types.d.ts.map +0 -1
- package/dist/database-types.js +0 -8
- package/dist/index.d.ts +0 -4
- package/dist/index.d.ts.map +0 -1
- package/dist/index.js +0 -2
- package/dist/package.json +0 -32
- package/dist/supabase/migrations/20250429164909_pgflow_initial.sql +0 -579
- package/dist/supabase/migrations/20250517072017_pgflow_fix_poll_for_tasks_to_use_separate_statement_for_polling.sql +0 -101
- package/dist/supabase/migrations/20250609105135_pgflow_add_start_tasks_and_started_status.sql +0 -371
- package/dist/supabase/migrations/20250610180554_pgflow_add_set_vt_batch_and_use_it_in_start_tasks.sql +0 -127
- package/dist/supabase/migrations/20250614124241_pgflow_add_realtime.sql +0 -501
- package/dist/supabase/migrations/20250619195327_pgflow_fix_fail_task_missing_realtime_event.sql +0 -185
- package/dist/supabase/migrations/20250627090700_pgflow_fix_function_search_paths.sql +0 -6
- package/dist/supabase/migrations/20250707210212_pgflow_add_opt_start_delay.sql +0 -103
- package/dist/supabase/migrations/20250719205006_pgflow_worker_deprecation.sql +0 -2
- package/dist/supabase/migrations/20250912075001_pgflow_temp_pr1_schema.sql +0 -185
- package/dist/supabase/migrations/20250912080800_pgflow_temp_pr2_root_maps.sql +0 -95
- package/dist/supabase/migrations/20250912125339_pgflow_TEMP_task_spawning_optimization.sql +0 -146
- package/dist/supabase/migrations/20250916093518_pgflow_temp_add_cascade_complete.sql +0 -321
- package/dist/supabase/migrations/20250916142327_pgflow_temp_make_initial_tasks_nullable.sql +0 -624
- package/dist/supabase/migrations/20250916203905_pgflow_temp_handle_arrays_in_start_tasks.sql +0 -157
- package/dist/supabase/migrations/20250918042753_pgflow_temp_handle_map_output_aggregation.sql +0 -489
- package/dist/supabase/migrations/20250919101802_pgflow_temp_orphaned_messages_index.sql +0 -688
- package/dist/supabase/migrations/20250919135211_pgflow_temp_return_task_index_in_start_tasks.sql +0 -178
- package/dist/tsconfig.lib.tsbuildinfo +0 -1
- package/dist/types.d.ts +0 -95
- package/dist/types.d.ts.map +0 -1
- package/dist/types.js +0 -1
|
@@ -1,176 +0,0 @@
|
|
|
1
|
-
# PLAN: Race Condition Testing for Type Violations
|
|
2
|
-
|
|
3
|
-
## Background
|
|
4
|
-
|
|
5
|
-
When a type violation occurs (e.g., single step produces non-array for dependent map), the system must archive ALL active messages to prevent orphaned messages that cycle through workers indefinitely.
|
|
6
|
-
|
|
7
|
-
## Current Issue
|
|
8
|
-
|
|
9
|
-
The fix archives both `'queued'` AND `'started'` tasks, but existing tests don't properly validate the race condition scenarios.
|
|
10
|
-
|
|
11
|
-
## Test Scenarios Needed
|
|
12
|
-
|
|
13
|
-
### 1. Basic Type Violation (✅ Already Covered)
|
|
14
|
-
**Scenario**: Single task causes type violation
|
|
15
|
-
```
|
|
16
|
-
step1 (single) → step2 (single) → map_step
|
|
17
|
-
```
|
|
18
|
-
- Worker completes step2 with non-array
|
|
19
|
-
- Verify run fails and current task's message is archived
|
|
20
|
-
- **Coverage**: `non_array_to_map_should_fail.test.sql`
|
|
21
|
-
|
|
22
|
-
### 2. Concurrent Started Tasks (❌ Not Covered)
|
|
23
|
-
**Scenario**: Multiple workers have tasks in 'started' state when violation occurs
|
|
24
|
-
```
|
|
25
|
-
producer (single) → map_consumer (map, expects array)
|
|
26
|
-
producer (single) → parallel_task1 (single)
|
|
27
|
-
producer (single) → parallel_task2 (single)
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
**Test Flow**:
|
|
31
|
-
1. Complete producer with `[1, 2, 3]` (spawns 3 map tasks + 2 parallel tasks)
|
|
32
|
-
2. Worker A starts `map_consumer[0]`
|
|
33
|
-
3. Worker B starts `map_consumer[1]`
|
|
34
|
-
4. Worker C starts `parallel_task1`
|
|
35
|
-
5. Worker D starts `parallel_task2`
|
|
36
|
-
6. Worker C completes `parallel_task1` with non-array (violates some other map dependency)
|
|
37
|
-
7. **Verify**: ALL started tasks (map_consumer[0], map_consumer[1], parallel_task2) get archived
|
|
38
|
-
|
|
39
|
-
### 3. Mixed Queue States (❌ Not Covered)
|
|
40
|
-
**Scenario**: Mix of queued and started tasks across different steps
|
|
41
|
-
```
|
|
42
|
-
step1 → step2 → step3 → map_step
|
|
43
|
-
↘ step4 → step5
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
**Test Flow**:
|
|
47
|
-
1. Complete step1
|
|
48
|
-
2. Worker A starts step2
|
|
49
|
-
3. Worker B starts step4
|
|
50
|
-
4. Step3 and step5 remain queued
|
|
51
|
-
5. Worker A completes step2 with type violation
|
|
52
|
-
6. **Verify**: Both started (step4) AND queued (step3, step5) messages archived
|
|
53
|
-
|
|
54
|
-
### 4. Map Task Partial Processing (❌ Not Covered)
|
|
55
|
-
**Scenario**: Some map tasks started, others queued when violation occurs
|
|
56
|
-
```
|
|
57
|
-
producer → large_map (100 elements)
|
|
58
|
-
```
|
|
59
|
-
|
|
60
|
-
**Test Flow**:
|
|
61
|
-
1. Producer outputs array of 100 elements
|
|
62
|
-
2. Workers start processing first 10 tasks
|
|
63
|
-
3. 90 tasks remain queued
|
|
64
|
-
4. One of the started tasks detects downstream type violation
|
|
65
|
-
5. **Verify**: All 100 messages (10 started + 90 queued) get archived
|
|
66
|
-
|
|
67
|
-
### 5. Visibility Timeout Verification (❌ Not Covered)
|
|
68
|
-
**Scenario**: Ensure orphaned messages don't reappear after timeout
|
|
69
|
-
```
|
|
70
|
-
step1 → step2 → map_step
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
**Test Flow**:
|
|
74
|
-
1. Worker starts step2 (30s visibility timeout)
|
|
75
|
-
2. Type violation occurs but message NOT archived (simulate bug)
|
|
76
|
-
3. Wait 31 seconds
|
|
77
|
-
4. **Verify**: Message reappears in queue (demonstrates the bug)
|
|
78
|
-
5. Apply fix and verify message doesn't reappear
|
|
79
|
-
|
|
80
|
-
### 6. Nested Map Chains (❌ Not Covered)
|
|
81
|
-
**Scenario**: Type violation in middle of map chain
|
|
82
|
-
```
|
|
83
|
-
map1 (produces arrays) → map2 (expects arrays) → map3
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
**Test Flow**:
|
|
87
|
-
1. map1 task completes with non-array (violates map2 expectation)
|
|
88
|
-
2. Other map1 tasks are in various states (started/queued)
|
|
89
|
-
3. **Verify**: All map1 tasks archived, map2 never starts
|
|
90
|
-
|
|
91
|
-
### 7. Race During Archival (❌ Not Covered)
|
|
92
|
-
**Scenario**: Worker tries to complete task while archival is happening
|
|
93
|
-
```
|
|
94
|
-
step1 → step2 → map_step
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
**Test Flow**:
|
|
98
|
-
1. Worker A detects type violation, begins archiving
|
|
99
|
-
2. Worker B tries to complete its task during archival
|
|
100
|
-
3. **Verify**: Worker B's completion is rejected (guard clause)
|
|
101
|
-
4. **Verify**: No duplicate archival attempts
|
|
102
|
-
|
|
103
|
-
## Implementation Strategy
|
|
104
|
-
|
|
105
|
-
### Test Utilities Needed
|
|
106
|
-
|
|
107
|
-
1. **Multi-worker simulator**:
|
|
108
|
-
```sql
|
|
109
|
-
CREATE FUNCTION pgflow_tests.simulate_worker(
|
|
110
|
-
worker_id uuid,
|
|
111
|
-
flow_slug text
|
|
112
|
-
) RETURNS TABLE(...);
|
|
113
|
-
```
|
|
114
|
-
|
|
115
|
-
2. **Queue state inspector**:
|
|
116
|
-
```sql
|
|
117
|
-
CREATE FUNCTION pgflow_tests.inspect_queue_state(
|
|
118
|
-
flow_slug text
|
|
119
|
-
) RETURNS TABLE(
|
|
120
|
-
message_id bigint,
|
|
121
|
-
task_status text,
|
|
122
|
-
visibility_timeout timestamptz
|
|
123
|
-
);
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
3. **Time manipulation** (for visibility timeout tests):
|
|
127
|
-
```sql
|
|
128
|
-
-- May need to mock pgmq visibility behavior
|
|
129
|
-
```
|
|
130
|
-
|
|
131
|
-
### Test File Organization
|
|
132
|
-
|
|
133
|
-
```
|
|
134
|
-
supabase/tests/type_violations/
|
|
135
|
-
├── basic_violation.test.sql # Existing coverage
|
|
136
|
-
├── concurrent_started_tasks.test.sql # NEW: Scenario 2
|
|
137
|
-
├── mixed_queue_states.test.sql # NEW: Scenario 3
|
|
138
|
-
├── map_partial_processing.test.sql # NEW: Scenario 4
|
|
139
|
-
├── visibility_timeout_recovery.test.sql # NEW: Scenario 5
|
|
140
|
-
├── nested_map_chains.test.sql # NEW: Scenario 6
|
|
141
|
-
└── race_during_archival.test.sql # NEW: Scenario 7
|
|
142
|
-
```
|
|
143
|
-
|
|
144
|
-
## Success Criteria
|
|
145
|
-
|
|
146
|
-
1. **No orphaned messages**: Queue must be empty after type violation
|
|
147
|
-
2. **No message resurrection**: Archived messages don't reappear after timeout
|
|
148
|
-
3. **Complete cleanup**: ALL tasks (queued + started) for the run are handled
|
|
149
|
-
4. **Atomic operation**: Archival happens in single transaction
|
|
150
|
-
5. **Guard effectiveness**: No operations on failed runs
|
|
151
|
-
|
|
152
|
-
## Performance Considerations
|
|
153
|
-
|
|
154
|
-
- Test with large numbers of tasks (1000+) to verify batch archival performance
|
|
155
|
-
- Ensure archival doesn't lock tables for extended periods
|
|
156
|
-
- Verify index usage on `(run_id, status, message_id)`
|
|
157
|
-
|
|
158
|
-
## Current Gap Analysis
|
|
159
|
-
|
|
160
|
-
**What we have**:
|
|
161
|
-
- Basic type violation detection ✅
|
|
162
|
-
- Single task archival ✅
|
|
163
|
-
- Run failure on violation ✅
|
|
164
|
-
|
|
165
|
-
**What we need**:
|
|
166
|
-
- True concurrent worker simulation ❌
|
|
167
|
-
- Multi-task race condition validation ❌
|
|
168
|
-
- Visibility timeout verification ❌
|
|
169
|
-
- Performance under load testing ❌
|
|
170
|
-
|
|
171
|
-
## Priority
|
|
172
|
-
|
|
173
|
-
1. **HIGH**: Concurrent started tasks (Scenario 2) - Most common real-world case
|
|
174
|
-
2. **HIGH**: Map partial processing (Scenario 4) - Critical for large arrays
|
|
175
|
-
3. **MEDIUM**: Mixed queue states (Scenario 3) - Complex flows
|
|
176
|
-
4. **LOW**: Other scenarios - Edge cases but important for robustness
|
|
@@ -1,17 +0,0 @@
|
|
|
1
|
-
import type postgres from 'postgres';
|
|
2
|
-
import type { StepTaskRecord, IPgflowClient, StepTaskKey, RunRow, MessageRecord } from './types.js';
|
|
3
|
-
import type { Json } from './types.js';
|
|
4
|
-
import type { AnyFlow, ExtractFlowInput } from '@pgflow/dsl';
|
|
5
|
-
/**
|
|
6
|
-
* Implementation of IPgflowClient that uses direct SQL calls to pgflow functions
|
|
7
|
-
*/
|
|
8
|
-
export declare class PgflowSqlClient<TFlow extends AnyFlow> implements IPgflowClient<TFlow> {
|
|
9
|
-
private readonly sql;
|
|
10
|
-
constructor(sql: postgres.Sql);
|
|
11
|
-
readMessages(queueName: string, visibilityTimeout: number, batchSize: number, maxPollSeconds?: number, pollIntervalMs?: number): Promise<MessageRecord[]>;
|
|
12
|
-
startTasks(flowSlug: string, msgIds: number[], workerId: string): Promise<StepTaskRecord<TFlow>[]>;
|
|
13
|
-
completeTask(stepTask: StepTaskKey, output?: Json): Promise<void>;
|
|
14
|
-
failTask(stepTask: StepTaskKey, error: unknown): Promise<void>;
|
|
15
|
-
startFlow<TFlow extends AnyFlow>(flow_slug: string, input: ExtractFlowInput<TFlow>, run_id?: string): Promise<RunRow>;
|
|
16
|
-
}
|
|
17
|
-
//# sourceMappingURL=PgflowSqlClient.d.ts.map
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
{"version":3,"file":"PgflowSqlClient.d.ts","sourceRoot":"","sources":["../src/PgflowSqlClient.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,QAAQ,MAAM,UAAU,CAAC;AACrC,OAAO,KAAK,EACV,cAAc,EACd,aAAa,EACb,WAAW,EACX,MAAM,EACN,aAAa,EACd,MAAM,YAAY,CAAC;AACpB,OAAO,KAAK,EAAE,IAAI,EAAE,MAAM,YAAY,CAAC;AACvC,OAAO,KAAK,EAAE,OAAO,EAAE,gBAAgB,EAAE,MAAM,aAAa,CAAC;AAE7D;;GAEG;AACH,qBAAa,eAAe,CAAC,KAAK,SAAS,OAAO,CAChD,YAAW,aAAa,CAAC,KAAK,CAAC;IAEnB,OAAO,CAAC,QAAQ,CAAC,GAAG;gBAAH,GAAG,EAAE,QAAQ,CAAC,GAAG;IAExC,YAAY,CAChB,SAAS,EAAE,MAAM,EACjB,iBAAiB,EAAE,MAAM,EACzB,SAAS,EAAE,MAAM,EACjB,cAAc,SAAI,EAClB,cAAc,SAAM,GACnB,OAAO,CAAC,aAAa,EAAE,CAAC;IAarB,UAAU,CACd,QAAQ,EAAE,MAAM,EAChB,MAAM,EAAE,MAAM,EAAE,EAChB,QAAQ,EAAE,MAAM,GACf,OAAO,CAAC,cAAc,CAAC,KAAK,CAAC,EAAE,CAAC;IAW7B,YAAY,CAAC,QAAQ,EAAE,WAAW,EAAE,MAAM,CAAC,EAAE,IAAI,GAAG,OAAO,CAAC,IAAI,CAAC;IAWjE,QAAQ,CAAC,QAAQ,EAAE,WAAW,EAAE,KAAK,EAAE,OAAO,GAAG,OAAO,CAAC,IAAI,CAAC;IAkB9D,SAAS,CAAC,KAAK,SAAS,OAAO,EACnC,SAAS,EAAE,MAAM,EACjB,KAAK,EAAE,gBAAgB,CAAC,KAAK,CAAC,EAC9B,MAAM,CAAC,EAAE,MAAM,GACd,OAAO,CAAC,MAAM,CAAC;CAiBnB"}
|
package/dist/PgflowSqlClient.js
DELETED
|
@@ -1,70 +0,0 @@
|
|
|
1
|
-
/**
|
|
2
|
-
* Implementation of IPgflowClient that uses direct SQL calls to pgflow functions
|
|
3
|
-
*/
|
|
4
|
-
export class PgflowSqlClient {
|
|
5
|
-
sql;
|
|
6
|
-
constructor(sql) {
|
|
7
|
-
this.sql = sql;
|
|
8
|
-
}
|
|
9
|
-
async readMessages(queueName, visibilityTimeout, batchSize, maxPollSeconds = 5, pollIntervalMs = 200) {
|
|
10
|
-
return await this.sql `
|
|
11
|
-
SELECT *
|
|
12
|
-
FROM pgflow.read_with_poll(
|
|
13
|
-
queue_name => ${queueName},
|
|
14
|
-
vt => ${visibilityTimeout},
|
|
15
|
-
qty => ${batchSize},
|
|
16
|
-
max_poll_seconds => ${maxPollSeconds},
|
|
17
|
-
poll_interval_ms => ${pollIntervalMs}
|
|
18
|
-
);
|
|
19
|
-
`;
|
|
20
|
-
}
|
|
21
|
-
async startTasks(flowSlug, msgIds, workerId) {
|
|
22
|
-
return await this.sql `
|
|
23
|
-
SELECT *
|
|
24
|
-
FROM pgflow.start_tasks(
|
|
25
|
-
flow_slug => ${flowSlug},
|
|
26
|
-
msg_ids => ${msgIds}::bigint[],
|
|
27
|
-
worker_id => ${workerId}::uuid
|
|
28
|
-
);
|
|
29
|
-
`;
|
|
30
|
-
}
|
|
31
|
-
async completeTask(stepTask, output) {
|
|
32
|
-
await this.sql `
|
|
33
|
-
SELECT pgflow.complete_task(
|
|
34
|
-
run_id => ${stepTask.run_id}::uuid,
|
|
35
|
-
step_slug => ${stepTask.step_slug}::text,
|
|
36
|
-
task_index => ${stepTask.task_index}::int,
|
|
37
|
-
output => ${this.sql.json(output || null)}::jsonb
|
|
38
|
-
);
|
|
39
|
-
`;
|
|
40
|
-
}
|
|
41
|
-
async failTask(stepTask, error) {
|
|
42
|
-
const errorString = typeof error === 'string'
|
|
43
|
-
? error
|
|
44
|
-
: error instanceof Error
|
|
45
|
-
? error.message
|
|
46
|
-
: JSON.stringify(error);
|
|
47
|
-
await this.sql `
|
|
48
|
-
SELECT pgflow.fail_task(
|
|
49
|
-
run_id => ${stepTask.run_id}::uuid,
|
|
50
|
-
step_slug => ${stepTask.step_slug}::text,
|
|
51
|
-
task_index => ${stepTask.task_index}::int,
|
|
52
|
-
error_message => ${errorString}::text
|
|
53
|
-
);
|
|
54
|
-
`;
|
|
55
|
-
}
|
|
56
|
-
async startFlow(flow_slug, input, run_id) {
|
|
57
|
-
const results = await this.sql `
|
|
58
|
-
SELECT * FROM pgflow.start_flow(
|
|
59
|
-
flow_slug => ${flow_slug}::text,
|
|
60
|
-
input => ${this.sql.json(input)}::jsonb
|
|
61
|
-
${run_id ? this.sql `, run_id => ${run_id}::uuid` : this.sql ``}
|
|
62
|
-
);
|
|
63
|
-
`;
|
|
64
|
-
if (results.length === 0) {
|
|
65
|
-
throw new Error(`Failed to start flow ${flow_slug}`);
|
|
66
|
-
}
|
|
67
|
-
const [flowRun] = results;
|
|
68
|
-
return flowRun;
|
|
69
|
-
}
|
|
70
|
-
}
|
package/dist/README.md
DELETED
|
@@ -1,399 +0,0 @@
|
|
|
1
|
-
# pgflow SQL Core
|
|
2
|
-
|
|
3
|
-
PostgreSQL-native workflow engine for defining, managing, and tracking DAG-based workflows directly in your database.
|
|
4
|
-
|
|
5
|
-
> [!NOTE]
|
|
6
|
-
> This project and all its components are licensed under [Apache 2.0](./LICENSE) license.
|
|
7
|
-
|
|
8
|
-
> [!WARNING]
|
|
9
|
-
> This project uses [Atlas](https://atlasgo.io/docs) to manage the schemas and migrations.
|
|
10
|
-
> See [ATLAS.md](ATLAS.md) for more details.
|
|
11
|
-
|
|
12
|
-
## Table of Contents
|
|
13
|
-
|
|
14
|
-
- [Overview](#overview)
|
|
15
|
-
- [Key Features](#key-features)
|
|
16
|
-
- [Architecture](#architecture)
|
|
17
|
-
- [Schema Design](#schema-design)
|
|
18
|
-
- [Execution Model](#execution-model)
|
|
19
|
-
- [Example Flow and its life](#example-flow-and-its-life)
|
|
20
|
-
- [Defining a Workflow](#defining-a-workflow)
|
|
21
|
-
- [Starting a Workflow Run](#starting-a-workflow-run)
|
|
22
|
-
- [Workflow Execution](#workflow-execution)
|
|
23
|
-
- [Task Polling](#task-polling)
|
|
24
|
-
- [Task Completion](#task-completion)
|
|
25
|
-
- [Error Handling](#error-handling)
|
|
26
|
-
- [Retries and Timeouts](#retries-and-timeouts)
|
|
27
|
-
- [TypeScript Flow DSL](#typescript-flow-dsl)
|
|
28
|
-
- [Overview](#overview-1)
|
|
29
|
-
- [Type Inference System](#type-inference-system)
|
|
30
|
-
- [Basic Example](#basic-example)
|
|
31
|
-
- [How Payload Types Are Built](#how-payload-types-are-built)
|
|
32
|
-
- [Benefits of Automatic Type Inference](#benefits-of-automatic-type-inference)
|
|
33
|
-
- [Data Flow](#data-flow)
|
|
34
|
-
- [Input and Output Handling](#input-and-output-handling)
|
|
35
|
-
- [Run Completion](#run-completion)
|
|
36
|
-
|
|
37
|
-
## Overview
|
|
38
|
-
|
|
39
|
-
The pgflow SQL Core provides the data model, state machine, and transactional functions for workflow management. It treats workflows as Directed Acyclic Graphs (DAGs) of steps, each step being a simple state machine.
|
|
40
|
-
|
|
41
|
-
This package focuses on:
|
|
42
|
-
|
|
43
|
-
- Defining and storing workflow shapes
|
|
44
|
-
- Managing workflow state transitions
|
|
45
|
-
- Exposing transactional functions for workflow operations
|
|
46
|
-
- Providing two-phase APIs for reliable task polling and status updates
|
|
47
|
-
|
|
48
|
-
The actual execution of workflow tasks is handled by the [Edge Worker](../edge-worker/README.md), which calls back to the SQL Core to acknowledge task completion or failure.
|
|
49
|
-
|
|
50
|
-
## Key Features
|
|
51
|
-
|
|
52
|
-
- **Declarative Workflows**: Define flows and steps via SQL tables
|
|
53
|
-
- **Dependency Management**: Explicit step dependencies with atomic transitions
|
|
54
|
-
- **Configurable Behavior**: Per-flow and per-step options for timeouts, retries, and delays
|
|
55
|
-
- **Queue Integration**: Built on pgmq for reliable task processing
|
|
56
|
-
- **Transactional Guarantees**: All state transitions are ACID-compliant
|
|
57
|
-
|
|
58
|
-
## Architecture
|
|
59
|
-
|
|
60
|
-
### Schema Design
|
|
61
|
-
|
|
62
|
-
[Schema ERD Diagram (click to enlarge)](./assets/schema.svg)
|
|
63
|
-
|
|
64
|
-
<a href="./assets/schema.svg">
|
|
65
|
-
<img src="./assets/schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
|
|
66
|
-
</a>
|
|
67
|
-
|
|
68
|
-
---
|
|
69
|
-
|
|
70
|
-
The schema consists of two main categories of tables:
|
|
71
|
-
|
|
72
|
-
#### Static definition tables
|
|
73
|
-
|
|
74
|
-
- `flows` (just an identity for the workflow with some global options)
|
|
75
|
-
- `steps` (DAG nodes belonging to particular `flows`, with option overrides)
|
|
76
|
-
- `deps` (DAG edges between `steps`)
|
|
77
|
-
|
|
78
|
-
#### Runtime state tables
|
|
79
|
-
|
|
80
|
-
- `runs` (execution instances of `flows`)
|
|
81
|
-
- `step_states` (states of individual `steps` within a `run`)
|
|
82
|
-
- `step_tasks` (units of work for individual `steps` within a `run`, so we can have fanouts)
|
|
83
|
-
|
|
84
|
-
### Execution Model
|
|
85
|
-
|
|
86
|
-
The SQL Core handles the workflow lifecycle through these key operations:
|
|
87
|
-
|
|
88
|
-
1. **Definition**: Workflows are defined using `create_flow` and `add_step`
|
|
89
|
-
2. **Instantiation**: Workflow instances are started with `start_flow`, creating a new run
|
|
90
|
-
3. **Task Retrieval**: The [Edge Worker](../edge-worker/README.md) uses two-phase polling - first `read_with_poll` to reserve queue messages, then `start_tasks` to convert them to executable tasks
|
|
91
|
-
4. **State Transitions**: When the Edge Worker reports back using `complete_task` or `fail_task`, the SQL Core handles state transitions and schedules dependent steps
|
|
92
|
-
|
|
93
|
-
[Flow lifecycle diagram (click to enlarge)](./assets/flow-lifecycle.svg)
|
|
94
|
-
|
|
95
|
-
<a href="./assets/flow-lifecycle.svg"><img src="./assets/flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
|
|
96
|
-
|
|
97
|
-
## Example flow and its life
|
|
98
|
-
|
|
99
|
-
Let's walk through creating and running a workflow that fetches a website,
|
|
100
|
-
does summarization and sentiment analysis in parallel steps
|
|
101
|
-
and saves the results to a database.
|
|
102
|
-
|
|
103
|
-

|
|
104
|
-
|
|
105
|
-
### Defining a Workflow
|
|
106
|
-
|
|
107
|
-
Workflows are defined using two SQL functions: `create_flow` and `add_step`.
|
|
108
|
-
|
|
109
|
-
In this example, we'll create a workflow with:
|
|
110
|
-
|
|
111
|
-
- `website` as the entry point ("root step")
|
|
112
|
-
- `sentiment` and `summary` as parallel steps that depend on `website`
|
|
113
|
-
- `saveToDb` as the final step, depending on both parallel steps
|
|
114
|
-
|
|
115
|
-
```sql
|
|
116
|
-
-- Define workflow with parallel steps
|
|
117
|
-
SELECT pgflow.create_flow('analyze_website');
|
|
118
|
-
SELECT pgflow.add_step('analyze_website', 'website');
|
|
119
|
-
SELECT pgflow.add_step('analyze_website', 'sentiment', deps_slugs => ARRAY['website']);
|
|
120
|
-
SELECT pgflow.add_step('analyze_website', 'summary', deps_slugs => ARRAY['website']);
|
|
121
|
-
SELECT pgflow.add_step('analyze_website', 'saveToDb', deps_slugs => ARRAY['sentiment', 'summary']);
|
|
122
|
-
```
|
|
123
|
-
|
|
124
|
-
> [!WARNING]
|
|
125
|
-
> You need to call `add_step` in topological order, which is enforced by foreign key constraints.
|
|
126
|
-
|
|
127
|
-
> [!NOTE]
|
|
128
|
-
> You can have multiple "root steps" in a workflow. You can even create a root-steps-only workflow
|
|
129
|
-
> to process a single input in parallel, because at the end, all of the outputs from steps
|
|
130
|
-
> that does not have dependents ("final steps") are aggregated and saved as run's `output`.
|
|
131
|
-
|
|
132
|
-
### Starting a Workflow Run
|
|
133
|
-
|
|
134
|
-
To start a workflow, call `start_flow` with a flow slug and input arguments:
|
|
135
|
-
|
|
136
|
-
```sql
|
|
137
|
-
SELECT * FROM pgflow.start_flow(
|
|
138
|
-
flow_slug => 'analyze_website',
|
|
139
|
-
input => '{"url": "https://example.com"}'::jsonb
|
|
140
|
-
);
|
|
141
|
-
|
|
142
|
-
-- run_id | flow_slug | status | input | output | remaining_steps
|
|
143
|
-
-- ------------+-----------------+---------+--------------------------------+--------+-----------------
|
|
144
|
-
-- <run uuid> | analyze_website | started | {"url": "https://example.com"} | [NULL] | 4
|
|
145
|
-
```
|
|
146
|
-
|
|
147
|
-
When a workflow starts:
|
|
148
|
-
|
|
149
|
-
- A new `run` record is created
|
|
150
|
-
- Initial states for all steps are created
|
|
151
|
-
- Root steps are marked as `started`
|
|
152
|
-
- Tasks are created for root steps
|
|
153
|
-
- Messages are enqueued on PGMQ for worker processing
|
|
154
|
-
|
|
155
|
-
> [!NOTE]
|
|
156
|
-
> The `input` argument must be a valid JSONB object: string, number, boolean, array, object or null.
|
|
157
|
-
|
|
158
|
-
### Workflow Execution
|
|
159
|
-
|
|
160
|
-
#### Task Polling
|
|
161
|
-
|
|
162
|
-
The Edge Worker uses a two-phase approach to retrieve and start tasks:
|
|
163
|
-
|
|
164
|
-
**Phase 1 - Reserve Messages:**
|
|
165
|
-
```sql
|
|
166
|
-
SELECT * FROM pgflow.read_with_poll(
|
|
167
|
-
queue_name => 'analyze_website',
|
|
168
|
-
vt => 60, -- visibility timeout in seconds
|
|
169
|
-
qty => 5 -- maximum number of messages to fetch
|
|
170
|
-
);
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
**Phase 2 - Start Tasks:**
|
|
174
|
-
```sql
|
|
175
|
-
SELECT * FROM pgflow.start_tasks(
|
|
176
|
-
flow_slug => 'analyze_website',
|
|
177
|
-
msg_ids => ARRAY[101, 102, 103], -- message IDs from phase 1
|
|
178
|
-
worker_id => '550e8400-e29b-41d4-a716-446655440000'::uuid
|
|
179
|
-
);
|
|
180
|
-
```
|
|
181
|
-
|
|
182
|
-
**How it works:**
|
|
183
|
-
|
|
184
|
-
1. **read_with_poll** reserves raw queue messages and hides them from other workers
|
|
185
|
-
2. **start_tasks** finds matching step_tasks, increments attempts counter, and builds task inputs
|
|
186
|
-
3. Task metadata and input are returned to the worker for execution
|
|
187
|
-
|
|
188
|
-
This two-phase approach ensures tasks always exist before processing begins, eliminating race conditions that could occur with single-phase polling.
|
|
189
|
-
|
|
190
|
-
#### Task Completion
|
|
191
|
-
|
|
192
|
-
After successful processing, the worker acknowledges completion:
|
|
193
|
-
|
|
194
|
-
```sql
|
|
195
|
-
SELECT pgflow.complete_task(
|
|
196
|
-
run_id => '<run_uuid>',
|
|
197
|
-
step_slug => 'website',
|
|
198
|
-
task_index => 0, -- we will have multiple tasks for a step in the future
|
|
199
|
-
output => '{"content": "HTML content", "status": 200}'::jsonb
|
|
200
|
-
);
|
|
201
|
-
```
|
|
202
|
-
|
|
203
|
-
When a task completes:
|
|
204
|
-
|
|
205
|
-
1. The task status is updated to 'completed' and the output is saved
|
|
206
|
-
2. The message is archived in PGMQ
|
|
207
|
-
3. The step state is updated to 'completed'
|
|
208
|
-
4. Dependent steps with all dependencies completed are automatically started
|
|
209
|
-
5. The run's remaining_steps counter is decremented
|
|
210
|
-
6. If all steps are completed, the run is marked as completed with aggregated outputs
|
|
211
|
-
|
|
212
|
-
#### Error Handling
|
|
213
|
-
|
|
214
|
-
If a task fails, the worker acknowledges this using `fail_task`:
|
|
215
|
-
|
|
216
|
-
```sql
|
|
217
|
-
SELECT pgflow.fail_task(
|
|
218
|
-
run_id => '<run_uuid>',
|
|
219
|
-
step_slug => 'website',
|
|
220
|
-
task_index => 0,
|
|
221
|
-
error_message => 'Connection timeout when fetching URL'::text
|
|
222
|
-
);
|
|
223
|
-
```
|
|
224
|
-
|
|
225
|
-
The system handles failures by:
|
|
226
|
-
|
|
227
|
-
1. Checking if retry attempts are available
|
|
228
|
-
2. For available retries:
|
|
229
|
-
- Keeping the task in 'queued' status
|
|
230
|
-
- Applying exponential backoff for visibility
|
|
231
|
-
- Preventing processing until the visibility timeout expires
|
|
232
|
-
3. When retries are exhausted:
|
|
233
|
-
- Marking the task as 'failed'
|
|
234
|
-
- Storing the task output (even for failed tasks)
|
|
235
|
-
- Marking the step as 'failed'
|
|
236
|
-
- Marking the run as 'failed'
|
|
237
|
-
- Archiving the message in PGMQ
|
|
238
|
-
- **Archiving all queued messages for the failed run** (preventing orphaned messages)
|
|
239
|
-
4. Additional failure handling:
|
|
240
|
-
- **No retries on already-failed runs** - tasks are immediately marked as failed
|
|
241
|
-
- **Graceful type constraint violations** - handled without exceptions when single steps feed map steps
|
|
242
|
-
- **Stores invalid output on type violations** - captures the output that caused the violation for debugging
|
|
243
|
-
- **Performance-optimized message archiving** using indexed queries
|
|
244
|
-
|
|
245
|
-
#### Retries and Timeouts
|
|
246
|
-
|
|
247
|
-
Retry behavior can be configured at both the flow and step level:
|
|
248
|
-
|
|
249
|
-
```sql
|
|
250
|
-
-- Flow-level defaults
|
|
251
|
-
SELECT pgflow.create_flow(
|
|
252
|
-
flow_slug => 'analyze_website',
|
|
253
|
-
max_attempts => 3, -- Maximum retry attempts (including first attempt)
|
|
254
|
-
base_delay => 5, -- Base delay in seconds for exponential backoff
|
|
255
|
-
timeout => 60 -- Task timeout in seconds
|
|
256
|
-
);
|
|
257
|
-
|
|
258
|
-
-- Step-level overrides
|
|
259
|
-
SELECT pgflow.add_step(
|
|
260
|
-
flow_slug => 'analyze_website',
|
|
261
|
-
step_slug => 'sentiment',
|
|
262
|
-
deps_slugs => ARRAY['website']::text[],
|
|
263
|
-
max_attempts => 5, -- Override max attempts for this step
|
|
264
|
-
base_delay => 2, -- Override base delay for exponential backoff
|
|
265
|
-
timeout => 30 -- Override timeout for this step
|
|
266
|
-
);
|
|
267
|
-
```
|
|
268
|
-
|
|
269
|
-
The system applies exponential backoff for retries using the formula:
|
|
270
|
-
|
|
271
|
-
```
|
|
272
|
-
delay = base_delay * (2 ^ attempts_count)
|
|
273
|
-
```
|
|
274
|
-
|
|
275
|
-
Timeouts are enforced by setting the message visibility timeout to the step's timeout value plus a small buffer. If a worker doesn't acknowledge completion or failure within this period, the task becomes visible again and can be retried.
|
|
276
|
-
|
|
277
|
-
## TypeScript Flow DSL
|
|
278
|
-
|
|
279
|
-
> [!NOTE]
|
|
280
|
-
> TypeScript Flow DSL is a Work In Progress and is not ready yet!
|
|
281
|
-
|
|
282
|
-
### Overview
|
|
283
|
-
|
|
284
|
-
While the SQL Core engine handles workflow definitions and state management, the primary way to define and work with your workflow logic is via the Flow DSL in TypeScript. This DSL offers a fluent API that makes it straightforward to outline the steps in your flow with full type safety.
|
|
285
|
-
|
|
286
|
-
### Type Inference System
|
|
287
|
-
|
|
288
|
-
The most powerful feature of the Flow DSL is its **automatic type inference system**:
|
|
289
|
-
|
|
290
|
-
1. You only need to annotate the initial Flow input type
|
|
291
|
-
2. The return type of each step is automatically inferred from your handler function
|
|
292
|
-
3. These return types become available in the payload of dependent steps
|
|
293
|
-
4. The TypeScript compiler builds a complete type graph matching your workflow DAG
|
|
294
|
-
|
|
295
|
-
This means you get full IDE autocompletion and type checking throughout your workflow without manual type annotations.
|
|
296
|
-
|
|
297
|
-
### Basic Example
|
|
298
|
-
|
|
299
|
-
Here's an example that matches our website analysis workflow:
|
|
300
|
-
|
|
301
|
-
```ts
|
|
302
|
-
// Provide a type for the input of the Flow
|
|
303
|
-
type Input = {
|
|
304
|
-
url: string;
|
|
305
|
-
};
|
|
306
|
-
|
|
307
|
-
const AnalyzeWebsite = new Flow<Input>({
|
|
308
|
-
slug: 'analyze_website',
|
|
309
|
-
maxAttempts: 3,
|
|
310
|
-
baseDelay: 5,
|
|
311
|
-
timeout: 10,
|
|
312
|
-
})
|
|
313
|
-
.step(
|
|
314
|
-
{ slug: 'website' },
|
|
315
|
-
async (input) => await scrapeWebsite(input.run.url)
|
|
316
|
-
)
|
|
317
|
-
.step(
|
|
318
|
-
{ slug: 'sentiment', dependsOn: ['website'], timeout: 30, maxAttempts: 5 },
|
|
319
|
-
async (input) => await analyzeSentiment(input.website.content)
|
|
320
|
-
)
|
|
321
|
-
.step(
|
|
322
|
-
{ slug: 'summary', dependsOn: ['website'] },
|
|
323
|
-
async (input) => await summarizeWithAI(input.website.content)
|
|
324
|
-
)
|
|
325
|
-
.step(
|
|
326
|
-
{ slug: 'saveToDb', dependsOn: ['sentiment', 'summary'] },
|
|
327
|
-
async (input) =>
|
|
328
|
-
await saveToDb({
|
|
329
|
-
websiteUrl: input.run.url,
|
|
330
|
-
sentiment: input.sentiment.score,
|
|
331
|
-
summary: input.summary,
|
|
332
|
-
}).status
|
|
333
|
-
);
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
### How Payload Types Are Built
|
|
337
|
-
|
|
338
|
-
The payload object for each step is constructed dynamically based on:
|
|
339
|
-
|
|
340
|
-
1. **The `run` property**: Always contains the original workflow input
|
|
341
|
-
2. **Dependency outputs**: Each dependency's output is available under a key matching the dependency's ID
|
|
342
|
-
3. **DAG structure**: Only outputs from direct dependencies are included in the payload
|
|
343
|
-
|
|
344
|
-
This means your step handlers receive exactly the data they need, properly typed, without any manual type declarations beyond the initial Flow input type.
|
|
345
|
-
|
|
346
|
-
### Benefits of Automatic Type Inference
|
|
347
|
-
|
|
348
|
-
- **Refactoring safety**: Change a step's output, and TypeScript will flag all dependent steps that need updates
|
|
349
|
-
- **Discoverability**: IDE autocompletion shows exactly what data is available in each step
|
|
350
|
-
- **Error prevention**: Catch typos and type mismatches at compile time, not runtime
|
|
351
|
-
- **Documentation**: The types themselves serve as living documentation of your workflow's data flow
|
|
352
|
-
|
|
353
|
-
## Data Flow
|
|
354
|
-
|
|
355
|
-
### Input and Output Handling
|
|
356
|
-
|
|
357
|
-
Handlers in pgflow **must return** JSON-serializable values that are captured and saved when `complete_task` is called. These outputs become available as inputs to dependent steps, allowing data to flow through your workflow pipeline.
|
|
358
|
-
|
|
359
|
-
When a step is executed, it receives an input object where:
|
|
360
|
-
|
|
361
|
-
- Each key is a step_slug of a completed dependency
|
|
362
|
-
- Each value is that step's output
|
|
363
|
-
- A special "run" key contains the original workflow input
|
|
364
|
-
|
|
365
|
-
#### Example: `sentiment`
|
|
366
|
-
|
|
367
|
-
When the `sentiment` step runs, it receives:
|
|
368
|
-
|
|
369
|
-
```json
|
|
370
|
-
{
|
|
371
|
-
"run": { "url": "https://example.com" },
|
|
372
|
-
"website": { "content": "HTML content", "status": 200 }
|
|
373
|
-
}
|
|
374
|
-
```
|
|
375
|
-
|
|
376
|
-
#### Example: `saveToDb`
|
|
377
|
-
|
|
378
|
-
The `saveToDb` step depends on both `sentiment` and `summary`:
|
|
379
|
-
|
|
380
|
-
```json
|
|
381
|
-
{
|
|
382
|
-
"run": { "url": "https://example.com" },
|
|
383
|
-
"sentiment": { "score": 0.85, "label": "positive" },
|
|
384
|
-
"summary": "This website discusses various topics related to technology and innovation."
|
|
385
|
-
}
|
|
386
|
-
```
|
|
387
|
-
|
|
388
|
-
### Run Completion
|
|
389
|
-
|
|
390
|
-
When all steps in a run are completed, the run status is automatically updated to 'completed' and its output is set. The output is an aggregation of all the outputs from final steps (steps that have no dependents):
|
|
391
|
-
|
|
392
|
-
```sql
|
|
393
|
-
-- Example of a completed run with output
|
|
394
|
-
SELECT run_id, status, output FROM pgflow.runs WHERE run_id = '<run_uuid>';
|
|
395
|
-
|
|
396
|
-
-- run_id | status | output
|
|
397
|
-
-- ------------+-----------+-----------------------------------------------------
|
|
398
|
-
-- <run uuid> | completed | {"saveToDb": {"status": "success"}}
|
|
399
|
-
```
|