@pgflow/core 0.1.18 → 0.1.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +31 -19
- package/dist/ATLAS.md +32 -0
- package/dist/CHANGELOG.md +16 -0
- package/dist/README.md +31 -19
- package/dist/database-types.d.ts +116 -45
- package/dist/database-types.d.ts.map +1 -1
- package/dist/database-types.js +8 -1
- package/dist/package.json +2 -2
- package/dist/supabase/migrations/20250429164909_pgflow_initial.sql +579 -0
- package/package.json +3 -3
- package/dist/supabase/migrations/000000_schema.sql +0 -149
- package/dist/supabase/migrations/000005_create_flow.sql +0 -29
- package/dist/supabase/migrations/000010_add_step.sql +0 -48
- package/dist/supabase/migrations/000015_start_ready_steps.sql +0 -45
- package/dist/supabase/migrations/000020_start_flow.sql +0 -46
- package/dist/supabase/migrations/000030_read_with_poll_backport.sql +0 -70
- package/dist/supabase/migrations/000040_poll_for_tasks.sql +0 -100
- package/dist/supabase/migrations/000045_maybe_complete_run.sql +0 -30
- package/dist/supabase/migrations/000050_complete_task.sql +0 -98
- package/dist/supabase/migrations/000055_calculate_retry_delay.sql +0 -11
- package/dist/supabase/migrations/000060_fail_task.sql +0 -124
- package/dist/supabase/migrations/000_edge_worker_initial.sql +0 -86
package/README.md
CHANGED
|
@@ -6,6 +6,10 @@ PostgreSQL-native workflow engine for defining, managing, and tracking DAG-based
|
|
|
6
6
|
> This project is licensed under [AGPL v3](./LICENSE.md) license and is part of **pgflow** stack.
|
|
7
7
|
> See [LICENSING_OVERVIEW.md](../../LICENSING_OVERVIEW.md) in root of this monorepo for more details.
|
|
8
8
|
|
|
9
|
+
> [!WARNING]
|
|
10
|
+
> This project uses [Atlas](https://atlasgo.io/docs) to manage the schemas and migrations.
|
|
11
|
+
> See [ATLAS.md](ATLAS.md) for more details.
|
|
12
|
+
|
|
9
13
|
## Table of Contents
|
|
10
14
|
|
|
11
15
|
- [Overview](#overview)
|
|
@@ -56,10 +60,10 @@ The actual execution of workflow tasks is handled by the [Edge Worker](../edge-w
|
|
|
56
60
|
|
|
57
61
|
### Schema Design
|
|
58
62
|
|
|
59
|
-
[Schema ERD Diagram (click to enlarge)](./schema.svg)
|
|
63
|
+
[Schema ERD Diagram (click to enlarge)](./assets/schema.svg)
|
|
60
64
|
|
|
61
|
-
<a href="./schema.svg">
|
|
62
|
-
<img src="./schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
|
|
65
|
+
<a href="./assets/schema.svg">
|
|
66
|
+
<img src="./assets/schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
|
|
63
67
|
</a>
|
|
64
68
|
|
|
65
69
|
---
|
|
@@ -87,23 +91,24 @@ The SQL Core handles the workflow lifecycle through these key operations:
|
|
|
87
91
|
3. **Task Management**: The [Edge Worker](../edge-worker/README.md) polls for available tasks using `poll_for_tasks`
|
|
88
92
|
4. **State Transitions**: When the Edge Worker reports back using `complete_task` or `fail_task`, the SQL Core handles state transitions and schedules dependent steps
|
|
89
93
|
|
|
90
|
-
[Flow lifecycle diagram (click to enlarge)](./flow-lifecycle.svg)
|
|
94
|
+
[Flow lifecycle diagram (click to enlarge)](./assets/flow-lifecycle.svg)
|
|
91
95
|
|
|
92
|
-
<a href="./flow-lifecycle.svg"><img src="./flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
|
|
96
|
+
<a href="./assets/flow-lifecycle.svg"><img src="./assets/flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
|
|
93
97
|
|
|
94
98
|
## Example flow and its life
|
|
95
99
|
|
|
96
|
-
Let's walk through creating and running a workflow that fetches a website,
|
|
100
|
+
Let's walk through creating and running a workflow that fetches a website,
|
|
97
101
|
does summarization and sentiment analysis in parallel steps
|
|
98
102
|
and saves the results to a database.
|
|
99
103
|
|
|
100
|
-

|
|
104
|
+

|
|
101
105
|
|
|
102
106
|
### Defining a Workflow
|
|
103
107
|
|
|
104
108
|
Workflows are defined using two SQL functions: `create_flow` and `add_step`.
|
|
105
109
|
|
|
106
110
|
In this example, we'll create a workflow with:
|
|
111
|
+
|
|
107
112
|
- `website` as the entry point ("root step")
|
|
108
113
|
- `sentiment` and `summary` as parallel steps that depend on `website`
|
|
109
114
|
- `saveToDb` as the final step, depending on both parallel steps
|
|
@@ -122,7 +127,7 @@ SELECT pgflow.add_step('analyze_website', 'saveToDb', deps_slugs => ARRAY['senti
|
|
|
122
127
|
|
|
123
128
|
> [!NOTE]
|
|
124
129
|
> You can have multiple "root steps" in a workflow. You can even create a root-steps-only workflow
|
|
125
|
-
> to process a single input in parallel, because at the end, all of the outputs from steps
|
|
130
|
+
> to process a single input in parallel, because at the end, all of the outputs from steps
|
|
126
131
|
> that does not have dependents ("final steps") are aggregated and saved as run's `output`.
|
|
127
132
|
|
|
128
133
|
### Starting a Workflow Run
|
|
@@ -131,16 +136,17 @@ To start a workflow, call `start_flow` with a flow slug and input arguments:
|
|
|
131
136
|
|
|
132
137
|
```sql
|
|
133
138
|
SELECT * FROM pgflow.start_flow(
|
|
134
|
-
flow_slug => 'analyze_website',
|
|
139
|
+
flow_slug => 'analyze_website',
|
|
135
140
|
input => '{"url": "https://example.com"}'::jsonb
|
|
136
141
|
);
|
|
137
142
|
|
|
138
|
-
-- run_id | flow_slug | status | input | output | remaining_steps
|
|
143
|
+
-- run_id | flow_slug | status | input | output | remaining_steps
|
|
139
144
|
-- ------------+-----------------+---------+--------------------------------+--------+-----------------
|
|
140
145
|
-- <run uuid> | analyze_website | started | {"url": "https://example.com"} | [NULL] | 4
|
|
141
146
|
```
|
|
142
147
|
|
|
143
148
|
When a workflow starts:
|
|
149
|
+
|
|
144
150
|
- A new `run` record is created
|
|
145
151
|
- Initial states for all steps are created
|
|
146
152
|
- Root steps are marked as `started`
|
|
@@ -187,6 +193,7 @@ SELECT pgflow.complete_task(
|
|
|
187
193
|
```
|
|
188
194
|
|
|
189
195
|
When a task completes:
|
|
196
|
+
|
|
190
197
|
1. The task status is updated to 'completed' and the output is saved
|
|
191
198
|
2. The message is archived in PGMQ
|
|
192
199
|
3. The step state is updated to 'completed'
|
|
@@ -246,6 +253,7 @@ SELECT pgflow.add_step(
|
|
|
246
253
|
```
|
|
247
254
|
|
|
248
255
|
The system applies exponential backoff for retries using the formula:
|
|
256
|
+
|
|
249
257
|
```
|
|
250
258
|
delay = base_delay * (2 ^ attempts_count)
|
|
251
259
|
```
|
|
@@ -283,22 +291,25 @@ type Input = {
|
|
|
283
291
|
};
|
|
284
292
|
|
|
285
293
|
const AnalyzeWebsite = new Flow<Input>({
|
|
286
|
-
slug:
|
|
294
|
+
slug: 'analyze_website',
|
|
287
295
|
maxAttempts: 3,
|
|
288
296
|
baseDelay: 5,
|
|
289
297
|
timeout: 10,
|
|
290
298
|
})
|
|
291
|
-
.step({ slug: "website" }, async (input) => await scrapeWebsite(input.run.url))
|
|
292
299
|
.step(
|
|
293
|
-
{ slug:
|
|
300
|
+
{ slug: 'website' },
|
|
301
|
+
async (input) => await scrapeWebsite(input.run.url)
|
|
302
|
+
)
|
|
303
|
+
.step(
|
|
304
|
+
{ slug: 'sentiment', dependsOn: ['website'], timeout: 30, maxAttempts: 5 },
|
|
294
305
|
async (input) => await analyzeSentiment(input.website.content)
|
|
295
306
|
)
|
|
296
307
|
.step(
|
|
297
|
-
{ slug:
|
|
308
|
+
{ slug: 'summary', dependsOn: ['website'] },
|
|
298
309
|
async (input) => await summarizeWithAI(input.website.content)
|
|
299
310
|
)
|
|
300
311
|
.step(
|
|
301
|
-
{ slug:
|
|
312
|
+
{ slug: 'saveToDb', dependsOn: ['sentiment', 'summary'] },
|
|
302
313
|
async (input) =>
|
|
303
314
|
await saveToDb({
|
|
304
315
|
websiteUrl: input.run.url,
|
|
@@ -332,6 +343,7 @@ This means your step handlers receive exactly the data they need, properly typed
|
|
|
332
343
|
Handlers in pgflow **must return** JSON-serializable values that are captured and saved when `complete_task` is called. These outputs become available as inputs to dependent steps, allowing data to flow through your workflow pipeline.
|
|
333
344
|
|
|
334
345
|
When a step is executed, it receives an input object where:
|
|
346
|
+
|
|
335
347
|
- Each key is a step_slug of a completed dependency
|
|
336
348
|
- Each value is that step's output
|
|
337
349
|
- A special "run" key contains the original workflow input
|
|
@@ -342,8 +354,8 @@ When the `sentiment` step runs, it receives:
|
|
|
342
354
|
|
|
343
355
|
```json
|
|
344
356
|
{
|
|
345
|
-
"run": {"url": "https://example.com"},
|
|
346
|
-
"website": {"content": "HTML content", "status": 200}
|
|
357
|
+
"run": { "url": "https://example.com" },
|
|
358
|
+
"website": { "content": "HTML content", "status": 200 }
|
|
347
359
|
}
|
|
348
360
|
```
|
|
349
361
|
|
|
@@ -353,8 +365,8 @@ The `saveToDb` step depends on both `sentiment` and `summary`:
|
|
|
353
365
|
|
|
354
366
|
```json
|
|
355
367
|
{
|
|
356
|
-
"run": {"url": "https://example.com"},
|
|
357
|
-
"sentiment": {"score": 0.85, "label": "positive"},
|
|
368
|
+
"run": { "url": "https://example.com" },
|
|
369
|
+
"sentiment": { "score": 0.85, "label": "positive" },
|
|
358
370
|
"summary": "This website discusses various topics related to technology and innovation."
|
|
359
371
|
}
|
|
360
372
|
```
|
package/dist/ATLAS.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Atlas setup
|
|
2
|
+
|
|
3
|
+
We use [Atlas](https://atlasgo.io/docs) to generate migrations from the declarative schemas stored in `./schemas/` folder.
|
|
4
|
+
|
|
5
|
+
## Configuration
|
|
6
|
+
|
|
7
|
+
The setup is configured in `atlas.hcl`.
|
|
8
|
+
|
|
9
|
+
It is set to compare `schemas/` to what is in `supabase/migrations/`.
|
|
10
|
+
|
|
11
|
+
### Docker dev image
|
|
12
|
+
|
|
13
|
+
Atlas requires a dev database to be available for computing diffs.
|
|
14
|
+
The database must be empty, but contain everything needed for the schemas to apply.
|
|
15
|
+
|
|
16
|
+
We need a configured [PGMQ](https://github.com/tembo-io/pgmq) extension, which Atlas does not support
|
|
17
|
+
in their dev images.
|
|
18
|
+
|
|
19
|
+
That's why this setup relies on a custom built image `jumski/postgres-15-pgmq:latest`.
|
|
20
|
+
|
|
21
|
+
Inspect `Dockerfile.atlas` to see how it is built.
|
|
22
|
+
|
|
23
|
+
See also `./scripts/build-atlas-postgres-image` and `./scripts/push-atlas-postgres-image` scripts for building and pushing the image.
|
|
24
|
+
|
|
25
|
+
## Workflow
|
|
26
|
+
|
|
27
|
+
1. Make sure you start with a clean database (`pnpm supabase db reset`).
|
|
28
|
+
1. Modify the schemas in `schemas/` to a desired state.
|
|
29
|
+
1. Run `./scripts/atlas-migrate-diff <migration-name>` to create a new migration based on the diff.
|
|
30
|
+
1. Run `pnpm supabase migration up` to apply the migration.
|
|
31
|
+
1. In case of any errors, remove the generated migration file, make changes in `schemas/` and repeat the process.
|
|
32
|
+
1. After the migration is applied, verify it does not break tests with `nx test:pgtap`
|
package/dist/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,21 @@
|
|
|
1
1
|
# @pgflow/core
|
|
2
2
|
|
|
3
|
+
## 0.1.20
|
|
4
|
+
|
|
5
|
+
### Patch Changes
|
|
6
|
+
|
|
7
|
+
- 09e3210: Change name of initial migration :-(
|
|
8
|
+
- 985176e: Add step_index to steps and various status timestamps to runtime tables
|
|
9
|
+
- @pgflow/dsl@0.1.20
|
|
10
|
+
|
|
11
|
+
## 0.1.19
|
|
12
|
+
|
|
13
|
+
### Patch Changes
|
|
14
|
+
|
|
15
|
+
- a10b442: Add minimum set of indexes
|
|
16
|
+
- efbd108: Convert migrations to declarative schemas and generate initial migration
|
|
17
|
+
- @pgflow/dsl@0.1.19
|
|
18
|
+
|
|
3
19
|
## 0.1.18
|
|
4
20
|
|
|
5
21
|
### Patch Changes
|
package/dist/README.md
CHANGED
|
@@ -6,6 +6,10 @@ PostgreSQL-native workflow engine for defining, managing, and tracking DAG-based
|
|
|
6
6
|
> This project is licensed under [AGPL v3](./LICENSE.md) license and is part of **pgflow** stack.
|
|
7
7
|
> See [LICENSING_OVERVIEW.md](../../LICENSING_OVERVIEW.md) in root of this monorepo for more details.
|
|
8
8
|
|
|
9
|
+
> [!WARNING]
|
|
10
|
+
> This project uses [Atlas](https://atlasgo.io/docs) to manage the schemas and migrations.
|
|
11
|
+
> See [ATLAS.md](ATLAS.md) for more details.
|
|
12
|
+
|
|
9
13
|
## Table of Contents
|
|
10
14
|
|
|
11
15
|
- [Overview](#overview)
|
|
@@ -56,10 +60,10 @@ The actual execution of workflow tasks is handled by the [Edge Worker](../edge-w
|
|
|
56
60
|
|
|
57
61
|
### Schema Design
|
|
58
62
|
|
|
59
|
-
[Schema ERD Diagram (click to enlarge)](./schema.svg)
|
|
63
|
+
[Schema ERD Diagram (click to enlarge)](./assets/schema.svg)
|
|
60
64
|
|
|
61
|
-
<a href="./schema.svg">
|
|
62
|
-
<img src="./schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
|
|
65
|
+
<a href="./assets/schema.svg">
|
|
66
|
+
<img src="./assets/schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
|
|
63
67
|
</a>
|
|
64
68
|
|
|
65
69
|
---
|
|
@@ -87,23 +91,24 @@ The SQL Core handles the workflow lifecycle through these key operations:
|
|
|
87
91
|
3. **Task Management**: The [Edge Worker](../edge-worker/README.md) polls for available tasks using `poll_for_tasks`
|
|
88
92
|
4. **State Transitions**: When the Edge Worker reports back using `complete_task` or `fail_task`, the SQL Core handles state transitions and schedules dependent steps
|
|
89
93
|
|
|
90
|
-
[Flow lifecycle diagram (click to enlarge)](./flow-lifecycle.svg)
|
|
94
|
+
[Flow lifecycle diagram (click to enlarge)](./assets/flow-lifecycle.svg)
|
|
91
95
|
|
|
92
|
-
<a href="./flow-lifecycle.svg"><img src="./flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
|
|
96
|
+
<a href="./assets/flow-lifecycle.svg"><img src="./assets/flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
|
|
93
97
|
|
|
94
98
|
## Example flow and its life
|
|
95
99
|
|
|
96
|
-
Let's walk through creating and running a workflow that fetches a website,
|
|
100
|
+
Let's walk through creating and running a workflow that fetches a website,
|
|
97
101
|
does summarization and sentiment analysis in parallel steps
|
|
98
102
|
and saves the results to a database.
|
|
99
103
|
|
|
100
|
-

|
|
104
|
+

|
|
101
105
|
|
|
102
106
|
### Defining a Workflow
|
|
103
107
|
|
|
104
108
|
Workflows are defined using two SQL functions: `create_flow` and `add_step`.
|
|
105
109
|
|
|
106
110
|
In this example, we'll create a workflow with:
|
|
111
|
+
|
|
107
112
|
- `website` as the entry point ("root step")
|
|
108
113
|
- `sentiment` and `summary` as parallel steps that depend on `website`
|
|
109
114
|
- `saveToDb` as the final step, depending on both parallel steps
|
|
@@ -122,7 +127,7 @@ SELECT pgflow.add_step('analyze_website', 'saveToDb', deps_slugs => ARRAY['senti
|
|
|
122
127
|
|
|
123
128
|
> [!NOTE]
|
|
124
129
|
> You can have multiple "root steps" in a workflow. You can even create a root-steps-only workflow
|
|
125
|
-
> to process a single input in parallel, because at the end, all of the outputs from steps
|
|
130
|
+
> to process a single input in parallel, because at the end, all of the outputs from steps
|
|
126
131
|
> that does not have dependents ("final steps") are aggregated and saved as run's `output`.
|
|
127
132
|
|
|
128
133
|
### Starting a Workflow Run
|
|
@@ -131,16 +136,17 @@ To start a workflow, call `start_flow` with a flow slug and input arguments:
|
|
|
131
136
|
|
|
132
137
|
```sql
|
|
133
138
|
SELECT * FROM pgflow.start_flow(
|
|
134
|
-
flow_slug => 'analyze_website',
|
|
139
|
+
flow_slug => 'analyze_website',
|
|
135
140
|
input => '{"url": "https://example.com"}'::jsonb
|
|
136
141
|
);
|
|
137
142
|
|
|
138
|
-
-- run_id | flow_slug | status | input | output | remaining_steps
|
|
143
|
+
-- run_id | flow_slug | status | input | output | remaining_steps
|
|
139
144
|
-- ------------+-----------------+---------+--------------------------------+--------+-----------------
|
|
140
145
|
-- <run uuid> | analyze_website | started | {"url": "https://example.com"} | [NULL] | 4
|
|
141
146
|
```
|
|
142
147
|
|
|
143
148
|
When a workflow starts:
|
|
149
|
+
|
|
144
150
|
- A new `run` record is created
|
|
145
151
|
- Initial states for all steps are created
|
|
146
152
|
- Root steps are marked as `started`
|
|
@@ -187,6 +193,7 @@ SELECT pgflow.complete_task(
|
|
|
187
193
|
```
|
|
188
194
|
|
|
189
195
|
When a task completes:
|
|
196
|
+
|
|
190
197
|
1. The task status is updated to 'completed' and the output is saved
|
|
191
198
|
2. The message is archived in PGMQ
|
|
192
199
|
3. The step state is updated to 'completed'
|
|
@@ -246,6 +253,7 @@ SELECT pgflow.add_step(
|
|
|
246
253
|
```
|
|
247
254
|
|
|
248
255
|
The system applies exponential backoff for retries using the formula:
|
|
256
|
+
|
|
249
257
|
```
|
|
250
258
|
delay = base_delay * (2 ^ attempts_count)
|
|
251
259
|
```
|
|
@@ -283,22 +291,25 @@ type Input = {
|
|
|
283
291
|
};
|
|
284
292
|
|
|
285
293
|
const AnalyzeWebsite = new Flow<Input>({
|
|
286
|
-
slug:
|
|
294
|
+
slug: 'analyze_website',
|
|
287
295
|
maxAttempts: 3,
|
|
288
296
|
baseDelay: 5,
|
|
289
297
|
timeout: 10,
|
|
290
298
|
})
|
|
291
|
-
.step({ slug: "website" }, async (input) => await scrapeWebsite(input.run.url))
|
|
292
299
|
.step(
|
|
293
|
-
{ slug:
|
|
300
|
+
{ slug: 'website' },
|
|
301
|
+
async (input) => await scrapeWebsite(input.run.url)
|
|
302
|
+
)
|
|
303
|
+
.step(
|
|
304
|
+
{ slug: 'sentiment', dependsOn: ['website'], timeout: 30, maxAttempts: 5 },
|
|
294
305
|
async (input) => await analyzeSentiment(input.website.content)
|
|
295
306
|
)
|
|
296
307
|
.step(
|
|
297
|
-
{ slug:
|
|
308
|
+
{ slug: 'summary', dependsOn: ['website'] },
|
|
298
309
|
async (input) => await summarizeWithAI(input.website.content)
|
|
299
310
|
)
|
|
300
311
|
.step(
|
|
301
|
-
{ slug:
|
|
312
|
+
{ slug: 'saveToDb', dependsOn: ['sentiment', 'summary'] },
|
|
302
313
|
async (input) =>
|
|
303
314
|
await saveToDb({
|
|
304
315
|
websiteUrl: input.run.url,
|
|
@@ -332,6 +343,7 @@ This means your step handlers receive exactly the data they need, properly typed
|
|
|
332
343
|
Handlers in pgflow **must return** JSON-serializable values that are captured and saved when `complete_task` is called. These outputs become available as inputs to dependent steps, allowing data to flow through your workflow pipeline.
|
|
333
344
|
|
|
334
345
|
When a step is executed, it receives an input object where:
|
|
346
|
+
|
|
335
347
|
- Each key is a step_slug of a completed dependency
|
|
336
348
|
- Each value is that step's output
|
|
337
349
|
- A special "run" key contains the original workflow input
|
|
@@ -342,8 +354,8 @@ When the `sentiment` step runs, it receives:
|
|
|
342
354
|
|
|
343
355
|
```json
|
|
344
356
|
{
|
|
345
|
-
"run": {"url": "https://example.com"},
|
|
346
|
-
"website": {"content": "HTML content", "status": 200}
|
|
357
|
+
"run": { "url": "https://example.com" },
|
|
358
|
+
"website": { "content": "HTML content", "status": 200 }
|
|
347
359
|
}
|
|
348
360
|
```
|
|
349
361
|
|
|
@@ -353,8 +365,8 @@ The `saveToDb` step depends on both `sentiment` and `summary`:
|
|
|
353
365
|
|
|
354
366
|
```json
|
|
355
367
|
{
|
|
356
|
-
"run": {"url": "https://example.com"},
|
|
357
|
-
"sentiment": {"score": 0.85, "label": "positive"},
|
|
368
|
+
"run": { "url": "https://example.com" },
|
|
369
|
+
"sentiment": { "score": 0.85, "label": "positive" },
|
|
358
370
|
"summary": "This website discusses various topics related to technology and innovation."
|
|
359
371
|
}
|
|
360
372
|
```
|