@pgflow/core 0.0.5 → 0.0.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/{CHANGELOG.md → dist/CHANGELOG.md} +6 -0
- package/package.json +8 -5
- package/__tests__/mocks/index.ts +0 -1
- package/__tests__/mocks/postgres.ts +0 -37
- package/__tests__/types/PgflowSqlClient.test-d.ts +0 -59
- package/docs/options_for_flow_and_steps.md +0 -75
- package/docs/pgflow-blob-reference-system.md +0 -179
- package/eslint.config.cjs +0 -22
- package/example-flow.mermaid +0 -5
- package/example-flow.svg +0 -1
- package/flow-lifecycle.mermaid +0 -83
- package/flow-lifecycle.svg +0 -1
- package/out-tsc/vitest/__tests__/mocks/index.d.ts +0 -2
- package/out-tsc/vitest/__tests__/mocks/index.d.ts.map +0 -1
- package/out-tsc/vitest/__tests__/mocks/postgres.d.ts +0 -15
- package/out-tsc/vitest/__tests__/mocks/postgres.d.ts.map +0 -1
- package/out-tsc/vitest/__tests__/types/PgflowSqlClient.test-d.d.ts +0 -2
- package/out-tsc/vitest/__tests__/types/PgflowSqlClient.test-d.d.ts.map +0 -1
- package/out-tsc/vitest/tsconfig.spec.tsbuildinfo +0 -1
- package/out-tsc/vitest/vite.config.d.ts +0 -3
- package/out-tsc/vitest/vite.config.d.ts.map +0 -1
- package/pkgs/core/dist/index.js +0 -54
- package/pkgs/core/dist/pkgs/core/LICENSE.md +0 -660
- package/pkgs/core/dist/pkgs/core/README.md +0 -373
- package/pkgs/dsl/dist/index.js +0 -123
- package/pkgs/dsl/dist/pkgs/dsl/README.md +0 -11
- package/pkgs/edge-worker/dist/index.js +0 -953
- package/pkgs/edge-worker/dist/index.js.map +0 -7
- package/pkgs/edge-worker/dist/pkgs/edge-worker/LICENSE.md +0 -660
- package/pkgs/edge-worker/dist/pkgs/edge-worker/README.md +0 -46
- package/pkgs/example-flows/dist/index.js +0 -152
- package/pkgs/example-flows/dist/pkgs/example-flows/README.md +0 -11
- package/project.json +0 -125
- package/prompts/architect.md +0 -87
- package/prompts/condition.md +0 -33
- package/prompts/declarative_sql.md +0 -15
- package/prompts/deps_in_payloads.md +0 -20
- package/prompts/dsl-multi-arg.ts +0 -48
- package/prompts/dsl-options.md +0 -39
- package/prompts/dsl-single-arg.ts +0 -51
- package/prompts/dsl-two-arg.ts +0 -61
- package/prompts/dsl.md +0 -119
- package/prompts/fanout_steps.md +0 -1
- package/prompts/json_schemas.md +0 -36
- package/prompts/one_shot.md +0 -286
- package/prompts/pgtap.md +0 -229
- package/prompts/sdk.md +0 -59
- package/prompts/step_types.md +0 -62
- package/prompts/versioning.md +0 -16
- package/queries/fail_permanently.sql +0 -17
- package/queries/fail_task.sql +0 -21
- package/queries/sequential.sql +0 -47
- package/queries/two_roots_left_right.sql +0 -59
- package/schema.svg +0 -1
- package/scripts/colorize-pgtap-output.awk +0 -72
- package/scripts/run-test-with-colors +0 -5
- package/scripts/watch-test +0 -7
- package/src/PgflowSqlClient.ts +0 -85
- package/src/database-types.ts +0 -759
- package/src/index.ts +0 -3
- package/src/types.ts +0 -103
- package/supabase/config.toml +0 -32
- package/supabase/seed.sql +0 -202
- package/supabase/tests/add_step/basic_step_addition.test.sql +0 -29
- package/supabase/tests/add_step/circular_dependency.test.sql +0 -21
- package/supabase/tests/add_step/flow_isolation.test.sql +0 -26
- package/supabase/tests/add_step/idempotent_step_addition.test.sql +0 -20
- package/supabase/tests/add_step/invalid_step_slug.test.sql +0 -16
- package/supabase/tests/add_step/nonexistent_dependency.test.sql +0 -16
- package/supabase/tests/add_step/nonexistent_flow.test.sql +0 -13
- package/supabase/tests/add_step/options.test.sql +0 -66
- package/supabase/tests/add_step/step_with_dependency.test.sql +0 -36
- package/supabase/tests/add_step/step_with_multiple_dependencies.test.sql +0 -46
- package/supabase/tests/complete_task/archives_message.test.sql +0 -67
- package/supabase/tests/complete_task/completes_run_if_no_more_remaining_steps.test.sql +0 -62
- package/supabase/tests/complete_task/completes_task_and_updates_dependents.test.sql +0 -64
- package/supabase/tests/complete_task/decrements_remaining_steps_if_completing_step.test.sql +0 -62
- package/supabase/tests/complete_task/saves_output_when_completing_run.test.sql +0 -57
- package/supabase/tests/create_flow/flow_creation.test.sql +0 -27
- package/supabase/tests/create_flow/idempotency_and_duplicates.test.sql +0 -26
- package/supabase/tests/create_flow/invalid_slug.test.sql +0 -13
- package/supabase/tests/create_flow/options.test.sql +0 -57
- package/supabase/tests/fail_task/exponential_backoff.test.sql +0 -70
- package/supabase/tests/fail_task/mark_as_failed_if_no_retries_available.test.sql +0 -49
- package/supabase/tests/fail_task/respects_flow_retry_settings.test.sql +0 -48
- package/supabase/tests/fail_task/respects_step_retry_settings.test.sql +0 -48
- package/supabase/tests/fail_task/retry_task_if_retries_available.test.sql +0 -39
- package/supabase/tests/is_valid_slug.test.sql +0 -72
- package/supabase/tests/poll_for_tasks/builds_proper_input_from_deps_outputs.test.sql +0 -35
- package/supabase/tests/poll_for_tasks/hides_messages.test.sql +0 -35
- package/supabase/tests/poll_for_tasks/increments_attempts_count.test.sql +0 -35
- package/supabase/tests/poll_for_tasks/multiple_task_processing.test.sql +0 -24
- package/supabase/tests/poll_for_tasks/polls_only_queued_tasks.test.sql +0 -35
- package/supabase/tests/poll_for_tasks/reads_messages.test.sql +0 -38
- package/supabase/tests/poll_for_tasks/returns_no_tasks_if_no_step_task_for_message.test.sql +0 -34
- package/supabase/tests/poll_for_tasks/returns_no_tasks_if_queue_is_empty.test.sql +0 -19
- package/supabase/tests/poll_for_tasks/returns_no_tasks_when_qty_set_to_0.test.sql +0 -22
- package/supabase/tests/poll_for_tasks/sets_vt_delay_based_on_opt_timeout.test.sql +0 -41
- package/supabase/tests/poll_for_tasks/tasks_reapppear_if_not_processed_in_time.test.sql +0 -59
- package/supabase/tests/start_flow/creates_run.test.sql +0 -24
- package/supabase/tests/start_flow/creates_step_states_for_all_steps.test.sql +0 -25
- package/supabase/tests/start_flow/creates_step_tasks_only_for_root_steps.test.sql +0 -54
- package/supabase/tests/start_flow/returns_run.test.sql +0 -24
- package/supabase/tests/start_flow/sends_messages_on_the_queue.test.sql +0 -50
- package/supabase/tests/start_flow/starts_only_root_steps.test.sql +0 -21
- package/supabase/tests/step_dsl_is_idempotent.test.sql +0 -34
- package/tsconfig.json +0 -16
- package/tsconfig.lib.json +0 -26
- package/tsconfig.spec.json +0 -35
- package/vite.config.ts +0 -57
package/prompts/dsl-two-arg.ts
DELETED
|
@@ -1,61 +0,0 @@
|
|
|
1
|
-
const ScrapeWebsiteFlow = new Flow<Input>()
|
|
2
|
-
.step(
|
|
3
|
-
{
|
|
4
|
-
slug: 'verify_status',
|
|
5
|
-
},
|
|
6
|
-
async (payload) => {
|
|
7
|
-
// Placeholder function
|
|
8
|
-
return { status: 'success' };
|
|
9
|
-
}
|
|
10
|
-
)
|
|
11
|
-
.step(
|
|
12
|
-
{
|
|
13
|
-
slug: 'when_success',
|
|
14
|
-
dependsOn: ['verify_status'],
|
|
15
|
-
runIf: { verify_status: { status: 'success' } },
|
|
16
|
-
},
|
|
17
|
-
async (payload) => {
|
|
18
|
-
// Placeholder function
|
|
19
|
-
return await scrapeSubpages(
|
|
20
|
-
payload.run.url,
|
|
21
|
-
payload.table_of_contents.urls_of_subpages
|
|
22
|
-
);
|
|
23
|
-
}
|
|
24
|
-
)
|
|
25
|
-
.step(
|
|
26
|
-
{
|
|
27
|
-
slug: 'when_server_error',
|
|
28
|
-
dependsOn: ['verify_status'],
|
|
29
|
-
runUnless: { verify_status: { status: 'success' } },
|
|
30
|
-
},
|
|
31
|
-
async (payload) => {
|
|
32
|
-
// Placeholder function
|
|
33
|
-
return await generateSummaries(payload.subpages.contentsOfSubpages);
|
|
34
|
-
}
|
|
35
|
-
)
|
|
36
|
-
.step(
|
|
37
|
-
{
|
|
38
|
-
slug: 'sentiments',
|
|
39
|
-
dependsOn: ['subpages'],
|
|
40
|
-
maxAttempts: 5,
|
|
41
|
-
baseDelay: 10,
|
|
42
|
-
},
|
|
43
|
-
async (payload) => {
|
|
44
|
-
// Placeholder function
|
|
45
|
-
return await analyzeSentiments(payload.subpages.contentsOfSubpages);
|
|
46
|
-
}
|
|
47
|
-
)
|
|
48
|
-
.step(
|
|
49
|
-
{
|
|
50
|
-
slug: 'save_to_db',
|
|
51
|
-
dependsOn: ['subpages', 'summaries', 'sentiments'],
|
|
52
|
-
},
|
|
53
|
-
async (payload) => {
|
|
54
|
-
// Placeholder function
|
|
55
|
-
return await saveToDb(
|
|
56
|
-
payload.subpages,
|
|
57
|
-
payload.summaries,
|
|
58
|
-
payload.sentiments
|
|
59
|
-
);
|
|
60
|
-
}
|
|
61
|
-
);
|
package/prompts/dsl.md
DELETED
|
@@ -1,119 +0,0 @@
|
|
|
1
|
-
# Flow DSL
|
|
2
|
-
|
|
3
|
-
Flow DSL is used do define shape of the flow and tie functions to particular steps.
|
|
4
|
-
|
|
5
|
-
## Full flow example
|
|
6
|
-
|
|
7
|
-
```ts
|
|
8
|
-
const ScrapeWebsiteFlow = new Flow<Input>()
|
|
9
|
-
.step('table_of_contents', async (payload) => {
|
|
10
|
-
// Placeholder function
|
|
11
|
-
return await fetchTableOfContents(payload.run.url);
|
|
12
|
-
})
|
|
13
|
-
.step('subpages', ['table_of_contents'], async (payload) => {
|
|
14
|
-
// Placeholder function
|
|
15
|
-
return await scrapeSubpages(payload.run.url, payload.table_of_contents.urls_of_subpages);
|
|
16
|
-
})
|
|
17
|
-
.step('summaries', ['subpages'], async (payload) => {
|
|
18
|
-
// Placeholder function
|
|
19
|
-
return await generateSummaries(payload.subpages.contentsOfSubpages);
|
|
20
|
-
})
|
|
21
|
-
.step('sentiments', ['subpages'], async (payload) => {
|
|
22
|
-
// Placeholder function
|
|
23
|
-
return await analyzeSentiments(payload.subpages.contentsOfSubpages);
|
|
24
|
-
})
|
|
25
|
-
.step('save_to_db', ['subpages', 'summaries', 'sentiments'], async (payload) => {
|
|
26
|
-
// Placeholder function
|
|
27
|
-
return await saveToDb(payload.subpages, payload.summaries, payload.sentiments);
|
|
28
|
-
});
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
## Explanation
|
|
32
|
-
|
|
33
|
-
This is Fluent API stype DSL but it is very simple:
|
|
34
|
-
|
|
35
|
-
1. Users create a flow by initializing a `Flow` object with a mandatory
|
|
36
|
-
type annotation for the Flow `input` - this is the type of the payload
|
|
37
|
-
users would start flow with and must be serializable to Json:
|
|
38
|
-
|
|
39
|
-
```ts
|
|
40
|
-
type Input = {
|
|
41
|
-
url: string; // url of the website to scrape
|
|
42
|
-
};
|
|
43
|
-
|
|
44
|
-
const ScrapeWebsiteFlow = new Flow<Input>()
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
2. Then they define steps by calling `.step(stepSlug: string, depsSlugs: string[], handler: Function)` method.
|
|
48
|
-
The `depsSlugs` array can be ommited if the step has no dependencies.
|
|
49
|
-
This kind of steps are named "root steps" and are run first and passed only the flow input payload:
|
|
50
|
-
|
|
51
|
-
```ts
|
|
52
|
-
const ScrapeWebsiteFlow = new Flow<Input>()
|
|
53
|
-
.step('table_of_contents', async (payload) => {
|
|
54
|
-
const { run } = payload;
|
|
55
|
-
// do something
|
|
56
|
-
// make sure to return some value so next steps can use it
|
|
57
|
-
return {
|
|
58
|
-
urls_of_subpages,
|
|
59
|
-
title
|
|
60
|
-
}
|
|
61
|
-
})
|
|
62
|
-
```
|
|
63
|
-
|
|
64
|
-
The `payload` object always have a special key `run` which is value passed as flow input -
|
|
65
|
-
every step can access and use it.
|
|
66
|
-
|
|
67
|
-
What the step handler returns is very important!
|
|
68
|
-
We name it `output` and it will be persisted in the the database
|
|
69
|
-
and used as `input` for the dependent steps.
|
|
70
|
-
|
|
71
|
-
It must be serializable to json.
|
|
72
|
-
|
|
73
|
-
3. Then they define dependent steps by calling `.step(stepSlug: string, depsSlugs: string[], handler: Function)` method,
|
|
74
|
-
now providing an array of dependencies slugs: `['table_of_contents']`.
|
|
75
|
-
|
|
76
|
-
```ts
|
|
77
|
-
.step('subpages', ['table_of_contents'], async (payload) => {
|
|
78
|
-
const { run, urls_of_subpages } = payload;
|
|
79
|
-
// do something
|
|
80
|
-
// make sure to return some value so next steps can use it
|
|
81
|
-
return {
|
|
82
|
-
contentsOfSubpages
|
|
83
|
-
}
|
|
84
|
-
})
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
Notice how the `payload` object got a new key `urls_of_subpages` - each dependency
|
|
88
|
-
results (the persisted return value from handler) will get passed to `payload` under the dependency slug key.
|
|
89
|
-
|
|
90
|
-
```ts
|
|
91
|
-
{
|
|
92
|
-
run: { url: 'https://example.com' },
|
|
93
|
-
table_of_contents: {
|
|
94
|
-
urls_of_subpages: ['https://example.com/subpage1', 'https://example.com/subpage2']
|
|
95
|
-
}
|
|
96
|
-
}
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
4. There can be multiple steps in parallel:
|
|
100
|
-
|
|
101
|
-
```ts
|
|
102
|
-
.step('summaries', ['subpages'], async (payload) => await doSomeStuff())
|
|
103
|
-
.step('sentiments', ['subpages'], async (payload) => await doSomeStuff())
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
5. Steps can also depend on more than one other step:
|
|
107
|
-
|
|
108
|
-
```ts
|
|
109
|
-
.step('save_to_db', ['subpages', 'summaries', 'sentiments'], async (payload) => await saveToDb())
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
6. When run finishes, the `output`s of steps that have no dependents will be combined
|
|
113
|
-
together and saved as the run's `output`. This object will be built in similar
|
|
114
|
-
way as the step `input` object, but will lack the `run` key.
|
|
115
|
-
|
|
116
|
-
7. Type Safety - all the step payloads types are inferred from the combination
|
|
117
|
-
of Flow input, handler inferred return type and the shape of the graph.
|
|
118
|
-
|
|
119
|
-
So users will always know that type is their step input.
|
package/prompts/fanout_steps.md
DELETED
|
@@ -1 +0,0 @@
|
|
|
1
|
-
.file prompts/architect.md -- we are in monorepo for pgflow - postgres native workflow engine. we are not in the root, we are in pkgs/core - an sql part. sql code lives in supabase/migrations/ and tests live in supabase/tests/. you can check some info about step types and declarative sql in prompts/ folder. your jobs is to implement next step type - 'fanout_tasks' type, which just enqueues multiple tasks for a single step - one task per input array item. this step type must have a json path parameter that will tell which part of input is an array of items. you must change the code so steps with this type are handled differently in all the functions. use task_index to indicate which array item the given task is for. when compoleting task, do not proceed with completing steps etc. unless all the tasks for given step state are completed, then use task_index to order their outputs and use all those as a step output array. try to understand what needs to be done first. modify existing migrations, do not create new ones - this is unrealeased source code and we can change migrations.
|
package/prompts/json_schemas.md
DELETED
|
@@ -1,36 +0,0 @@
|
|
|
1
|
-
# JSON Schemas
|
|
2
|
-
|
|
3
|
-
JSON schemas can be inferred from the steps `input` types,
|
|
4
|
-
so it is relatively easy to build a JSON schema for each step input.
|
|
5
|
-
|
|
6
|
-
The same goes for the JSON Schema for the flow input.
|
|
7
|
-
|
|
8
|
-
## Schema storage
|
|
9
|
-
|
|
10
|
-
Schemas should be stored in the `pgflow.flows` and `pgflow.steps` tables.
|
|
11
|
-
|
|
12
|
-
## Schemas in versioning
|
|
13
|
-
|
|
14
|
-
To make sure that slight changes in the input/output types of steps
|
|
15
|
-
trigger a new version of the flow, we need to use the inferred schemas
|
|
16
|
-
when generating a version hash of the flow.
|
|
17
|
-
|
|
18
|
-
## Schemas as validation
|
|
19
|
-
|
|
20
|
-
We can use schemas to do data validation for step handlers:
|
|
21
|
-
|
|
22
|
-
1. Task executors can validate the runtime input payloads for handler
|
|
23
|
-
and their output results against the schema.
|
|
24
|
-
2. Core SQL engine can use `pg_jsonschema` to validate the input values to flows
|
|
25
|
-
and maybe the input values to steps and fail steps if they don't match.
|
|
26
|
-
|
|
27
|
-
## Problems
|
|
28
|
-
|
|
29
|
-
Doing any JSON Schema validation in database is probably not a good idea because
|
|
30
|
-
of performance impact it would have.
|
|
31
|
-
|
|
32
|
-
Using runtime validation in Task Executors is probably good enough,
|
|
33
|
-
with exception of validating the Flow input - you start flows less often than
|
|
34
|
-
steps and it seems like a good idea to validate the input database-wise.
|
|
35
|
-
|
|
36
|
-
|
package/prompts/one_shot.md
DELETED
|
@@ -1,286 +0,0 @@
|
|
|
1
|
-
Your job is to implement required SQL schemas and functions for an MVP of my open source Postgres-native workflow orchestration engine called pgflow.
|
|
2
|
-
|
|
3
|
-
The main idea of the project is to keep shape of the DAG (nodes and edges) and its runtime state in the database
|
|
4
|
-
and expose SQL functions that will allow to propagate through the state.
|
|
5
|
-
|
|
6
|
-
Real work is done on the task queue workers and the functions from pgflow are only orchestrating
|
|
7
|
-
the queue messages.
|
|
8
|
-
|
|
9
|
-
Workers are supposed to call user functions with the input from the queue message,
|
|
10
|
-
and should acknowledge the completion of the task or its failure (error thrown) by
|
|
11
|
-
calling appropriate pgflow SQL functions.
|
|
12
|
-
|
|
13
|
-
This way the orchestration is decoupled from the execution.
|
|
14
|
-
|
|
15
|
-
I have a concrete implementation plan for you to follow and will unfold it
|
|
16
|
-
step by step below.
|
|
17
|
-
|
|
18
|
-
## Assumptions/best practices
|
|
19
|
-
|
|
20
|
-
### We are building Minimal Viable Product
|
|
21
|
-
|
|
22
|
-
Remember that we are building MVP and main focus should be on shipping something as soon as possible,
|
|
23
|
-
by cutting scope, simplifying the architectures and code.
|
|
24
|
-
|
|
25
|
-
But the outlined features are definitely something that we will be doing in the future.
|
|
26
|
-
I am most certain about the foreach-array steps - this is a MUST have.
|
|
27
|
-
So your focus should be on trying to implement the MVP but not closing the doors to the future improvements.
|
|
28
|
-
|
|
29
|
-
### Slugs
|
|
30
|
-
|
|
31
|
-
We do not use serial IDs nor UUIDs for static things, we use "slugs" instead.
|
|
32
|
-
A slug is just a string that conforms to following rules:
|
|
33
|
-
|
|
34
|
-
```sql
|
|
35
|
-
slug is not null
|
|
36
|
-
and slug <> ''
|
|
37
|
-
and length(slug) <= 128
|
|
38
|
-
and slug ~ '^[a-zA-Z_][a-zA-Z0-9_]*$';
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
We use UUID for identifying particular run of the flow.
|
|
42
|
-
But the states of steps for that particular run are not identified by separate UUIDs,
|
|
43
|
-
but rather by a pair of run_id and step_slug. This pattern allows to easily refer
|
|
44
|
-
to steps and flows by their slugs. **Leverage this pattern everywhere you can!**
|
|
45
|
-
|
|
46
|
-
### References/fkeys
|
|
47
|
-
|
|
48
|
-
Use foreign keys everywhere to ensure consistency.
|
|
49
|
-
Use composite foreign keys and composite primary keys composed of flow/step slugs and run_id's if needed.
|
|
50
|
-
|
|
51
|
-
### Declarative vs procedural
|
|
52
|
-
|
|
53
|
-
**YOU MUST ALWAYS PRIORITIZE DECLARATIVE STYLE** and prioritize Batching operations.
|
|
54
|
-
|
|
55
|
-
Avoid plpgsql as much as you can.
|
|
56
|
-
It is important to have your DB procedures run in batched ways and use declarative rather than procedural constructs where possible:
|
|
57
|
-
|
|
58
|
-
- do not ever use `language plplsql` in functions, always use `language sql`
|
|
59
|
-
- don't do loops, do SQL statements that address multiple rows at once.
|
|
60
|
-
- don't write trigger functions that fire for a single row, use `FOR EACH STATEMENT` instead.
|
|
61
|
-
- don't call functions for each row in a result set, a condition, a join, or whatever; instead use functions that return `SETOF` and join against these.
|
|
62
|
-
|
|
63
|
-
If you're constructing dynamic SQL, you should only ever use `%I` and `%L` when using `FORMAT` or similar; you should never see `%s` (with the very rare exception of where you're merging in another SQL fragment that you've previously formatted using %I and %L).
|
|
64
|
-
|
|
65
|
-
Remember, that functions have significant overhead in Postgres - instead of factoring into lots of tiny functions, think about how to make your code more expressive so there's no need.
|
|
66
|
-
|
|
67
|
-
## Schemas
|
|
68
|
-
|
|
69
|
-
### pgflow.flows
|
|
70
|
-
|
|
71
|
-
A static definition of a flow (DAG):
|
|
72
|
-
|
|
73
|
-
```sql
|
|
74
|
-
CREATE TABLE pgflow.flows (
|
|
75
|
-
flow_slug text PRIMARY KEY NOT NULL -- Unique identifier for the flow
|
|
76
|
-
CHECK (is_valid_slug(flow_slug))
|
|
77
|
-
);
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
### pgflow.steps
|
|
81
|
-
|
|
82
|
-
A static definition of a step within a flow (a DAG "nodes"):
|
|
83
|
-
|
|
84
|
-
```sql
|
|
85
|
-
CREATE TABLE pgflow.steps (
|
|
86
|
-
flow_slug text NOT NULL REFERENCES flows (flow_slug),
|
|
87
|
-
step_slug text NOT NULL,
|
|
88
|
-
step_type text NOT NULL DEFAULT 'single',
|
|
89
|
-
PRIMARY KEY (flow_slug, step_slug),
|
|
90
|
-
CHECK (is_valid_slug(flow_slug)),
|
|
91
|
-
CHECK (is_valid_slug(step_slug))
|
|
92
|
-
);
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
### pgflow.deps
|
|
96
|
-
|
|
97
|
-
A static definition of dependencies between steps (a DAG "edges"):
|
|
98
|
-
|
|
99
|
-
```sql
|
|
100
|
-
CREATE TABLE pgflow.deps (
|
|
101
|
-
flow_slug text NOT NULL REFERENCES pgflow.flows (flow_slug),
|
|
102
|
-
dep_slug text NOT NULL, -- The step that must complete first
|
|
103
|
-
step_slug text NOT NULL, -- The step that depends on dep_slug
|
|
104
|
-
PRIMARY KEY (flow_slug, dep_slug, step_slug),
|
|
105
|
-
FOREIGN KEY (flow_slug, dep_slug)
|
|
106
|
-
REFERENCES pgflow.steps (flow_slug, step_slug),
|
|
107
|
-
FOREIGN KEY (flow_slug, step_slug)
|
|
108
|
-
REFERENCES pgflow.steps (flow_slug, step_slug),
|
|
109
|
-
CHECK (dep_slug != step_slug) -- Prevent self-dependencies
|
|
110
|
-
CHECK (is_valid_slug(step_slug))
|
|
111
|
-
);
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
### pgflow.runs
|
|
115
|
-
|
|
116
|
-
A table storing runtime state of given flow.
|
|
117
|
-
A run is identified by a `flow_slug` and `run_id`.
|
|
118
|
-
|
|
119
|
-
```sql
|
|
120
|
-
CREATE TABLE pgflow.runs (
|
|
121
|
-
run_id uuid PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
|
|
122
|
-
flow_slug text NOT NULL REFERENCES pgflow.flows (flow_slug), -- denormalized
|
|
123
|
-
status text NOT NULL DEFAULT 'started',
|
|
124
|
-
input jsonb NOT NULL,
|
|
125
|
-
CHECK (status IN ('started', 'failed', 'completed'))
|
|
126
|
-
)
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
There is also `status` that currently can be started, failed or completed.
|
|
130
|
-
);
|
|
131
|
-
|
|
132
|
-
````
|
|
133
|
-
|
|
134
|
-
There is also `status` that currently can be pending, failed or completed.
|
|
135
|
-
|
|
136
|
-
### pgflow.step_states
|
|
137
|
-
|
|
138
|
-
Represents a state of a particular step in a particular run.
|
|
139
|
-
|
|
140
|
-
```sql
|
|
141
|
-
|
|
142
|
-
-- Step states table - tracks the state of individual steps within a run
|
|
143
|
-
CREATE TABLE pgflow.step_states (
|
|
144
|
-
flow_slug text NOT NULL REFERENCES pgflow.flows (flow_slug),
|
|
145
|
-
run_id uuid NOT NULL REFERENCES pgflow.runs (run_id),
|
|
146
|
-
step_slug text NOT NULL,
|
|
147
|
-
status text NOT NULL DEFAULT 'created',
|
|
148
|
-
PRIMARY KEY (run_id, step_slug),
|
|
149
|
-
FOREIGN KEY (flow_slug, step_slug)
|
|
150
|
-
REFERENCES pgflow.steps (flow_slug, step_slug),
|
|
151
|
-
CHECK (status IN ('created', 'started', 'completed', 'failed'))
|
|
152
|
-
);
|
|
153
|
-
);
|
|
154
|
-
````
|
|
155
|
-
|
|
156
|
-
### pgflow.step_tasks
|
|
157
|
-
|
|
158
|
-
This table is really unique and interesting. We are starting the development
|
|
159
|
-
of the flow orchestration engine with a simple step that runs one unit of work.
|
|
160
|
-
|
|
161
|
-
But I imagine we would suppport additional types of steps, like:
|
|
162
|
-
|
|
163
|
-
- a step that requires input array and enqueues a task per array item, so they are created in parallel
|
|
164
|
-
- a step that runs some preprocessing/postprocessing in an additional task
|
|
165
|
-
|
|
166
|
-
So in order to accomodate this, we need an additional layer between step_state and
|
|
167
|
-
an actual task queue, in order to track which messages belong to which steps,
|
|
168
|
-
in case there are more than 1 unit of work for given step.
|
|
169
|
-
|
|
170
|
-
```sql
|
|
171
|
-
-- Executio logs table - tracks the task of individual steps
|
|
172
|
-
CREATE TABLE pgflow.step_tasks (
|
|
173
|
-
flow_slug text NOT NULL REFERENCES pgflow.flows (flow_slug),
|
|
174
|
-
step_slug text NOT NULL,
|
|
175
|
-
run_id uuid NOT NULL REFERENCES pgflow.runs (run_id),
|
|
176
|
-
status text NOT NULL DEFAULT 'queued',
|
|
177
|
-
input jsonb NOT NULL, -- payload that will be passed to queue message
|
|
178
|
-
output jsonb, -- like step_result but for task, can store result or error/stacktrace
|
|
179
|
-
message_id bigint, -- an id of the queue message
|
|
180
|
-
CONSTRAINT step_tasks_pkey PRIMARY KEY (run_id, step_slug),
|
|
181
|
-
FOREIGN KEY (run_id, step_slug)
|
|
182
|
-
REFERENCES pgflow.step_states (run_id, step_slug),
|
|
183
|
-
CHECK (status IN ('queued', 'started', 'failed', 'completed')),
|
|
184
|
-
CHECK (is_valid_slug(flow_slug)),
|
|
185
|
-
CHECK (is_valid_slug(step_slug))
|
|
186
|
-
);
|
|
187
|
-
```
|
|
188
|
-
|
|
189
|
-
## Typescript DSL, topological ordering and acyclicity validation
|
|
190
|
-
|
|
191
|
-
The simple typescript DSL will be created that will have string typing
|
|
192
|
-
and will enforce adding steps in a topological order, preventing
|
|
193
|
-
cycles by the strict ordering of the steps addition.
|
|
194
|
-
|
|
195
|
-
Typescript DSL looks like this:
|
|
196
|
-
|
|
197
|
-
```ts
|
|
198
|
-
const BasicFlow = new Flow<string>()
|
|
199
|
-
.step('root', ({ run }) => {
|
|
200
|
-
return `[${run}]r00t`;
|
|
201
|
-
})
|
|
202
|
-
.step('left', ['root'], ({ root: r }) => {
|
|
203
|
-
return `${r}/left`;
|
|
204
|
-
})
|
|
205
|
-
.step('right', ['root'], ({ root: r }) => {
|
|
206
|
-
return `${r}/right`;
|
|
207
|
-
})
|
|
208
|
-
.step('end', ['left', 'right'], ({ left, right, run }) => {
|
|
209
|
-
return `<${left}> and <${right}> of (${run})`;
|
|
210
|
-
});
|
|
211
|
-
```
|
|
212
|
-
|
|
213
|
-
This will be compiled to a simple SQL calling SQL function `pgflow.add_step(flow_slug, step_slug, dep_step_slugs[])`:
|
|
214
|
-
|
|
215
|
-
```sql
|
|
216
|
-
SELECT pgflow.add_step('basic', 'root', ARRAY[]::text[]);
|
|
217
|
-
SELECT pgflow.add_step('basic', 'left', ARRAY['root']);
|
|
218
|
-
SELECT pgflow.add_step('basic', 'right', ARRAY['root']);
|
|
219
|
-
SELECT pgflow.add_step('basic', 'end', ARRAY['left', 'right']);
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
## SQL functions API
|
|
223
|
-
|
|
224
|
-
This describes public SQL functions that are available to developer using pgflow
|
|
225
|
-
and to the workers.
|
|
226
|
-
|
|
227
|
-
Developer calls `start_flow` and rest is called by the workers.
|
|
228
|
-
|
|
229
|
-
### pgflow.start_flow(flow_slug::text, input::jsonb)
|
|
230
|
-
|
|
231
|
-
This function is used to start a flow.
|
|
232
|
-
It should work like this:
|
|
233
|
-
|
|
234
|
-
- create a new `pgflow.runs` row for given flow_slug
|
|
235
|
-
- create all the `pgflow.step_states` rows corresponding to the steps in the flow
|
|
236
|
-
- find root steps (ones without dependencies) and call "start_step" on each of them
|
|
237
|
-
|
|
238
|
-
### pgflow.start_step(run_id::uuid, step_slug::text)
|
|
239
|
-
|
|
240
|
-
This function is called by start_flow but also by complete_step_task (or somewhere near its call)
|
|
241
|
-
when worker acknowledges the step_task completion and it is detected, that there are ready dependant
|
|
242
|
-
steps to be started.
|
|
243
|
-
|
|
244
|
-
It should probably call start_step_task under the hood, which will:
|
|
245
|
-
|
|
246
|
-
- updating step_state status/timestamps
|
|
247
|
-
- creating a step_task row
|
|
248
|
-
- enqueueing a queue message for this step_task
|
|
249
|
-
|
|
250
|
-
For other step types, like array/foreach, it would probably call the step_task
|
|
251
|
-
for each array item, so more than one step task is created and more than one message is enqueued.
|
|
252
|
-
|
|
253
|
-
### pgflow.start_step_task(run_id::uuid, step_slug::text, task_id::bigint)
|
|
254
|
-
|
|
255
|
-
I am not yet sure how this will work for other step types that will need more step tasks.
|
|
256
|
-
But probably each step type would have its own implementation of this function,
|
|
257
|
-
and a simple step type will just create a new step_task row and enqueue it.
|
|
258
|
-
|
|
259
|
-
But an array/foreach step type would need a different implementation.
|
|
260
|
-
Would need to check the input for the step which is an array, and would
|
|
261
|
-
create a new step_task for each array item and enqueue as many messages as there are items in the array.
|
|
262
|
-
|
|
263
|
-
### pgflow.complete_step_task(run_id::uuid, step_slug::text, output::jsonb)
|
|
264
|
-
|
|
265
|
-
This will be called by the worker when a step_task is completed.
|
|
266
|
-
It will work like this in the simplified version when one step_state corresponds to one step_task:
|
|
267
|
-
|
|
268
|
-
- it marks step_task as completed, saving the output
|
|
269
|
-
- it in turns mark step_state as completed, saving the output
|
|
270
|
-
- then it should check for any dependant steps (steps that depend on just completed step) in the same run
|
|
271
|
-
- it should then check if any of those dependant steps are "ready" - meaning, all their dependencies are completed
|
|
272
|
-
- for each of those
|
|
273
|
-
|
|
274
|
-
I am not yet sure how this will work for other step types that will need more step tasks.
|
|
275
|
-
Probably each step type would have its own implementation of this function,
|
|
276
|
-
so a simple step will just call complete_step_state when complete_step_task is called.
|
|
277
|
-
|
|
278
|
-
An array/foreach step type would need a different implementation.
|
|
279
|
-
Would probably need to check if other step_tasks are still pending.
|
|
280
|
-
If all are already completed, it would just call complete_step_state,
|
|
281
|
-
otherwise it will just continue, so other (last) step task can complete the step state.
|
|
282
|
-
|
|
283
|
-
### pgflow.fail_step_task(run_id::uuid, step_slug::text, error::jsonb)
|
|
284
|
-
|
|
285
|
-
This is very similar to complete_step_task, but it will mark step_task as failed,
|
|
286
|
-
will save error message and will call fail_step_state instead of complete_step_state.
|