@pgflow/core 0.1.18 → 0.1.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,6 +6,10 @@ PostgreSQL-native workflow engine for defining, managing, and tracking DAG-based
6
6
  > This project is licensed under [AGPL v3](./LICENSE.md) license and is part of **pgflow** stack.
7
7
  > See [LICENSING_OVERVIEW.md](../../LICENSING_OVERVIEW.md) in root of this monorepo for more details.
8
8
 
9
+ > [!WARNING]
10
+ > This project uses [Atlas](https://atlasgo.io/docs) to manage the schemas and migrations.
11
+ > See [ATLAS.md](ATLAS.md) for more details.
12
+
9
13
  ## Table of Contents
10
14
 
11
15
  - [Overview](#overview)
@@ -56,10 +60,10 @@ The actual execution of workflow tasks is handled by the [Edge Worker](../edge-w
56
60
 
57
61
  ### Schema Design
58
62
 
59
- [Schema ERD Diagram (click to enlarge)](./schema.svg)
63
+ [Schema ERD Diagram (click to enlarge)](./assets/schema.svg)
60
64
 
61
- <a href="./schema.svg">
62
- <img src="./schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
65
+ <a href="./assets/schema.svg">
66
+ <img src="./assets/schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
63
67
  </a>
64
68
 
65
69
  ---
@@ -87,23 +91,24 @@ The SQL Core handles the workflow lifecycle through these key operations:
87
91
  3. **Task Management**: The [Edge Worker](../edge-worker/README.md) polls for available tasks using `poll_for_tasks`
88
92
  4. **State Transitions**: When the Edge Worker reports back using `complete_task` or `fail_task`, the SQL Core handles state transitions and schedules dependent steps
89
93
 
90
- [Flow lifecycle diagram (click to enlarge)](./flow-lifecycle.svg)
94
+ [Flow lifecycle diagram (click to enlarge)](./assets/flow-lifecycle.svg)
91
95
 
92
- <a href="./flow-lifecycle.svg"><img src="./flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
96
+ <a href="./assets/flow-lifecycle.svg"><img src="./assets/flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
93
97
 
94
98
  ## Example flow and its life
95
99
 
96
- Let's walk through creating and running a workflow that fetches a website,
100
+ Let's walk through creating and running a workflow that fetches a website,
97
101
  does summarization and sentiment analysis in parallel steps
98
102
  and saves the results to a database.
99
103
 
100
- ![example flow graph](./example-flow.svg)
104
+ ![example flow graph](./assets/example-flow.svg)
101
105
 
102
106
  ### Defining a Workflow
103
107
 
104
108
  Workflows are defined using two SQL functions: `create_flow` and `add_step`.
105
109
 
106
110
  In this example, we'll create a workflow with:
111
+
107
112
  - `website` as the entry point ("root step")
108
113
  - `sentiment` and `summary` as parallel steps that depend on `website`
109
114
  - `saveToDb` as the final step, depending on both parallel steps
@@ -122,7 +127,7 @@ SELECT pgflow.add_step('analyze_website', 'saveToDb', deps_slugs => ARRAY['senti
122
127
 
123
128
  > [!NOTE]
124
129
  > You can have multiple "root steps" in a workflow. You can even create a root-steps-only workflow
125
- > to process a single input in parallel, because at the end, all of the outputs from steps
130
+ > to process a single input in parallel, because at the end, all of the outputs from steps
126
131
  > that does not have dependents ("final steps") are aggregated and saved as run's `output`.
127
132
 
128
133
  ### Starting a Workflow Run
@@ -131,16 +136,17 @@ To start a workflow, call `start_flow` with a flow slug and input arguments:
131
136
 
132
137
  ```sql
133
138
  SELECT * FROM pgflow.start_flow(
134
- flow_slug => 'analyze_website',
139
+ flow_slug => 'analyze_website',
135
140
  input => '{"url": "https://example.com"}'::jsonb
136
141
  );
137
142
 
138
- -- run_id | flow_slug | status | input | output | remaining_steps
143
+ -- run_id | flow_slug | status | input | output | remaining_steps
139
144
  -- ------------+-----------------+---------+--------------------------------+--------+-----------------
140
145
  -- <run uuid> | analyze_website | started | {"url": "https://example.com"} | [NULL] | 4
141
146
  ```
142
147
 
143
148
  When a workflow starts:
149
+
144
150
  - A new `run` record is created
145
151
  - Initial states for all steps are created
146
152
  - Root steps are marked as `started`
@@ -187,6 +193,7 @@ SELECT pgflow.complete_task(
187
193
  ```
188
194
 
189
195
  When a task completes:
196
+
190
197
  1. The task status is updated to 'completed' and the output is saved
191
198
  2. The message is archived in PGMQ
192
199
  3. The step state is updated to 'completed'
@@ -246,6 +253,7 @@ SELECT pgflow.add_step(
246
253
  ```
247
254
 
248
255
  The system applies exponential backoff for retries using the formula:
256
+
249
257
  ```
250
258
  delay = base_delay * (2 ^ attempts_count)
251
259
  ```
@@ -283,22 +291,25 @@ type Input = {
283
291
  };
284
292
 
285
293
  const AnalyzeWebsite = new Flow<Input>({
286
- slug: "analyze_website",
294
+ slug: 'analyze_website',
287
295
  maxAttempts: 3,
288
296
  baseDelay: 5,
289
297
  timeout: 10,
290
298
  })
291
- .step({ slug: "website" }, async (input) => await scrapeWebsite(input.run.url))
292
299
  .step(
293
- { slug: "sentiment", dependsOn: ["website"], timeout: 30, maxAttempts: 5 },
300
+ { slug: 'website' },
301
+ async (input) => await scrapeWebsite(input.run.url)
302
+ )
303
+ .step(
304
+ { slug: 'sentiment', dependsOn: ['website'], timeout: 30, maxAttempts: 5 },
294
305
  async (input) => await analyzeSentiment(input.website.content)
295
306
  )
296
307
  .step(
297
- { slug: "summary", dependsOn: ["website"] },
308
+ { slug: 'summary', dependsOn: ['website'] },
298
309
  async (input) => await summarizeWithAI(input.website.content)
299
310
  )
300
311
  .step(
301
- { slug: "saveToDb", dependsOn: ["sentiment", "summary"] },
312
+ { slug: 'saveToDb', dependsOn: ['sentiment', 'summary'] },
302
313
  async (input) =>
303
314
  await saveToDb({
304
315
  websiteUrl: input.run.url,
@@ -332,6 +343,7 @@ This means your step handlers receive exactly the data they need, properly typed
332
343
  Handlers in pgflow **must return** JSON-serializable values that are captured and saved when `complete_task` is called. These outputs become available as inputs to dependent steps, allowing data to flow through your workflow pipeline.
333
344
 
334
345
  When a step is executed, it receives an input object where:
346
+
335
347
  - Each key is a step_slug of a completed dependency
336
348
  - Each value is that step's output
337
349
  - A special "run" key contains the original workflow input
@@ -342,8 +354,8 @@ When the `sentiment` step runs, it receives:
342
354
 
343
355
  ```json
344
356
  {
345
- "run": {"url": "https://example.com"},
346
- "website": {"content": "HTML content", "status": 200}
357
+ "run": { "url": "https://example.com" },
358
+ "website": { "content": "HTML content", "status": 200 }
347
359
  }
348
360
  ```
349
361
 
@@ -353,8 +365,8 @@ The `saveToDb` step depends on both `sentiment` and `summary`:
353
365
 
354
366
  ```json
355
367
  {
356
- "run": {"url": "https://example.com"},
357
- "sentiment": {"score": 0.85, "label": "positive"},
368
+ "run": { "url": "https://example.com" },
369
+ "sentiment": { "score": 0.85, "label": "positive" },
358
370
  "summary": "This website discusses various topics related to technology and innovation."
359
371
  }
360
372
  ```
package/dist/ATLAS.md ADDED
@@ -0,0 +1,32 @@
1
+ # Atlas setup
2
+
3
+ We use [Atlas](https://atlasgo.io/docs) to generate migrations from the declarative schemas stored in `./schemas/` folder.
4
+
5
+ ## Configuration
6
+
7
+ The setup is configured in `atlas.hcl`.
8
+
9
+ It is set to compare `schemas/` to what is in `supabase/migrations/`.
10
+
11
+ ### Docker dev image
12
+
13
+ Atlas requires a dev database to be available for computing diffs.
14
+ The database must be empty, but contain everything needed for the schemas to apply.
15
+
16
+ We need a configured [PGMQ](https://github.com/tembo-io/pgmq) extension, which Atlas does not support
17
+ in their dev images.
18
+
19
+ That's why this setup relies on a custom built image `jumski/postgres-15-pgmq:latest`.
20
+
21
+ Inspect `Dockerfile.atlas` to see how it is built.
22
+
23
+ See also `./scripts/build-atlas-postgres-image` and `./scripts/push-atlas-postgres-image` scripts for building and pushing the image.
24
+
25
+ ## Workflow
26
+
27
+ 1. Make sure you start with a clean database (`pnpm supabase db reset`).
28
+ 1. Modify the schemas in `schemas/` to a desired state.
29
+ 1. Run `./scripts/atlas-migrate-diff <migration-name>` to create a new migration based on the diff.
30
+ 1. Run `pnpm supabase migration up` to apply the migration.
31
+ 1. In case of any errors, remove the generated migration file, make changes in `schemas/` and repeat the process.
32
+ 1. After the migration is applied, verify it does not break tests with `nx test:pgtap`
package/dist/CHANGELOG.md CHANGED
@@ -1,5 +1,21 @@
1
1
  # @pgflow/core
2
2
 
3
+ ## 0.1.20
4
+
5
+ ### Patch Changes
6
+
7
+ - 09e3210: Change name of initial migration :-(
8
+ - 985176e: Add step_index to steps and various status timestamps to runtime tables
9
+ - @pgflow/dsl@0.1.20
10
+
11
+ ## 0.1.19
12
+
13
+ ### Patch Changes
14
+
15
+ - a10b442: Add minimum set of indexes
16
+ - efbd108: Convert migrations to declarative schemas and generate initial migration
17
+ - @pgflow/dsl@0.1.19
18
+
3
19
  ## 0.1.18
4
20
 
5
21
  ### Patch Changes
package/dist/README.md CHANGED
@@ -6,6 +6,10 @@ PostgreSQL-native workflow engine for defining, managing, and tracking DAG-based
6
6
  > This project is licensed under [AGPL v3](./LICENSE.md) license and is part of **pgflow** stack.
7
7
  > See [LICENSING_OVERVIEW.md](../../LICENSING_OVERVIEW.md) in root of this monorepo for more details.
8
8
 
9
+ > [!WARNING]
10
+ > This project uses [Atlas](https://atlasgo.io/docs) to manage the schemas and migrations.
11
+ > See [ATLAS.md](ATLAS.md) for more details.
12
+
9
13
  ## Table of Contents
10
14
 
11
15
  - [Overview](#overview)
@@ -56,10 +60,10 @@ The actual execution of workflow tasks is handled by the [Edge Worker](../edge-w
56
60
 
57
61
  ### Schema Design
58
62
 
59
- [Schema ERD Diagram (click to enlarge)](./schema.svg)
63
+ [Schema ERD Diagram (click to enlarge)](./assets/schema.svg)
60
64
 
61
- <a href="./schema.svg">
62
- <img src="./schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
65
+ <a href="./assets/schema.svg">
66
+ <img src="./assets/schema.svg" alt="Schema ERD Diagram" width="25%" height="25%">
63
67
  </a>
64
68
 
65
69
  ---
@@ -87,23 +91,24 @@ The SQL Core handles the workflow lifecycle through these key operations:
87
91
  3. **Task Management**: The [Edge Worker](../edge-worker/README.md) polls for available tasks using `poll_for_tasks`
88
92
  4. **State Transitions**: When the Edge Worker reports back using `complete_task` or `fail_task`, the SQL Core handles state transitions and schedules dependent steps
89
93
 
90
- [Flow lifecycle diagram (click to enlarge)](./flow-lifecycle.svg)
94
+ [Flow lifecycle diagram (click to enlarge)](./assets/flow-lifecycle.svg)
91
95
 
92
- <a href="./flow-lifecycle.svg"><img src="./flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
96
+ <a href="./assets/flow-lifecycle.svg"><img src="./assets/flow-lifecycle.svg" alt="Flow Lifecycle" width="25%" height="25%"></a>
93
97
 
94
98
  ## Example flow and its life
95
99
 
96
- Let's walk through creating and running a workflow that fetches a website,
100
+ Let's walk through creating and running a workflow that fetches a website,
97
101
  does summarization and sentiment analysis in parallel steps
98
102
  and saves the results to a database.
99
103
 
100
- ![example flow graph](./example-flow.svg)
104
+ ![example flow graph](./assets/example-flow.svg)
101
105
 
102
106
  ### Defining a Workflow
103
107
 
104
108
  Workflows are defined using two SQL functions: `create_flow` and `add_step`.
105
109
 
106
110
  In this example, we'll create a workflow with:
111
+
107
112
  - `website` as the entry point ("root step")
108
113
  - `sentiment` and `summary` as parallel steps that depend on `website`
109
114
  - `saveToDb` as the final step, depending on both parallel steps
@@ -122,7 +127,7 @@ SELECT pgflow.add_step('analyze_website', 'saveToDb', deps_slugs => ARRAY['senti
122
127
 
123
128
  > [!NOTE]
124
129
  > You can have multiple "root steps" in a workflow. You can even create a root-steps-only workflow
125
- > to process a single input in parallel, because at the end, all of the outputs from steps
130
+ > to process a single input in parallel, because at the end, all of the outputs from steps
126
131
  > that does not have dependents ("final steps") are aggregated and saved as run's `output`.
127
132
 
128
133
  ### Starting a Workflow Run
@@ -131,16 +136,17 @@ To start a workflow, call `start_flow` with a flow slug and input arguments:
131
136
 
132
137
  ```sql
133
138
  SELECT * FROM pgflow.start_flow(
134
- flow_slug => 'analyze_website',
139
+ flow_slug => 'analyze_website',
135
140
  input => '{"url": "https://example.com"}'::jsonb
136
141
  );
137
142
 
138
- -- run_id | flow_slug | status | input | output | remaining_steps
143
+ -- run_id | flow_slug | status | input | output | remaining_steps
139
144
  -- ------------+-----------------+---------+--------------------------------+--------+-----------------
140
145
  -- <run uuid> | analyze_website | started | {"url": "https://example.com"} | [NULL] | 4
141
146
  ```
142
147
 
143
148
  When a workflow starts:
149
+
144
150
  - A new `run` record is created
145
151
  - Initial states for all steps are created
146
152
  - Root steps are marked as `started`
@@ -187,6 +193,7 @@ SELECT pgflow.complete_task(
187
193
  ```
188
194
 
189
195
  When a task completes:
196
+
190
197
  1. The task status is updated to 'completed' and the output is saved
191
198
  2. The message is archived in PGMQ
192
199
  3. The step state is updated to 'completed'
@@ -246,6 +253,7 @@ SELECT pgflow.add_step(
246
253
  ```
247
254
 
248
255
  The system applies exponential backoff for retries using the formula:
256
+
249
257
  ```
250
258
  delay = base_delay * (2 ^ attempts_count)
251
259
  ```
@@ -283,22 +291,25 @@ type Input = {
283
291
  };
284
292
 
285
293
  const AnalyzeWebsite = new Flow<Input>({
286
- slug: "analyze_website",
294
+ slug: 'analyze_website',
287
295
  maxAttempts: 3,
288
296
  baseDelay: 5,
289
297
  timeout: 10,
290
298
  })
291
- .step({ slug: "website" }, async (input) => await scrapeWebsite(input.run.url))
292
299
  .step(
293
- { slug: "sentiment", dependsOn: ["website"], timeout: 30, maxAttempts: 5 },
300
+ { slug: 'website' },
301
+ async (input) => await scrapeWebsite(input.run.url)
302
+ )
303
+ .step(
304
+ { slug: 'sentiment', dependsOn: ['website'], timeout: 30, maxAttempts: 5 },
294
305
  async (input) => await analyzeSentiment(input.website.content)
295
306
  )
296
307
  .step(
297
- { slug: "summary", dependsOn: ["website"] },
308
+ { slug: 'summary', dependsOn: ['website'] },
298
309
  async (input) => await summarizeWithAI(input.website.content)
299
310
  )
300
311
  .step(
301
- { slug: "saveToDb", dependsOn: ["sentiment", "summary"] },
312
+ { slug: 'saveToDb', dependsOn: ['sentiment', 'summary'] },
302
313
  async (input) =>
303
314
  await saveToDb({
304
315
  websiteUrl: input.run.url,
@@ -332,6 +343,7 @@ This means your step handlers receive exactly the data they need, properly typed
332
343
  Handlers in pgflow **must return** JSON-serializable values that are captured and saved when `complete_task` is called. These outputs become available as inputs to dependent steps, allowing data to flow through your workflow pipeline.
333
344
 
334
345
  When a step is executed, it receives an input object where:
346
+
335
347
  - Each key is a step_slug of a completed dependency
336
348
  - Each value is that step's output
337
349
  - A special "run" key contains the original workflow input
@@ -342,8 +354,8 @@ When the `sentiment` step runs, it receives:
342
354
 
343
355
  ```json
344
356
  {
345
- "run": {"url": "https://example.com"},
346
- "website": {"content": "HTML content", "status": 200}
357
+ "run": { "url": "https://example.com" },
358
+ "website": { "content": "HTML content", "status": 200 }
347
359
  }
348
360
  ```
349
361
 
@@ -353,8 +365,8 @@ The `saveToDb` step depends on both `sentiment` and `summary`:
353
365
 
354
366
  ```json
355
367
  {
356
- "run": {"url": "https://example.com"},
357
- "sentiment": {"score": 0.85, "label": "positive"},
368
+ "run": { "url": "https://example.com" },
369
+ "sentiment": { "score": 0.85, "label": "positive" },
358
370
  "summary": "This website discusses various topics related to technology and innovation."
359
371
  }
360
372
  ```