@pgflow/core 0.0.5-prealpha.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (120) hide show
  1. package/LICENSE.md +660 -0
  2. package/README.md +373 -0
  3. package/__tests__/mocks/index.ts +1 -0
  4. package/__tests__/mocks/postgres.ts +37 -0
  5. package/__tests__/types/PgflowSqlClient.test-d.ts +59 -0
  6. package/dist/LICENSE.md +660 -0
  7. package/dist/README.md +373 -0
  8. package/dist/index.js +54 -0
  9. package/docs/options_for_flow_and_steps.md +75 -0
  10. package/docs/pgflow-blob-reference-system.md +179 -0
  11. package/eslint.config.cjs +22 -0
  12. package/example-flow.mermaid +5 -0
  13. package/example-flow.svg +1 -0
  14. package/flow-lifecycle.mermaid +83 -0
  15. package/flow-lifecycle.svg +1 -0
  16. package/out-tsc/vitest/__tests__/mocks/index.d.ts +2 -0
  17. package/out-tsc/vitest/__tests__/mocks/index.d.ts.map +1 -0
  18. package/out-tsc/vitest/__tests__/mocks/postgres.d.ts +15 -0
  19. package/out-tsc/vitest/__tests__/mocks/postgres.d.ts.map +1 -0
  20. package/out-tsc/vitest/__tests__/types/PgflowSqlClient.test-d.d.ts +2 -0
  21. package/out-tsc/vitest/__tests__/types/PgflowSqlClient.test-d.d.ts.map +1 -0
  22. package/out-tsc/vitest/tsconfig.spec.tsbuildinfo +1 -0
  23. package/out-tsc/vitest/vite.config.d.ts +3 -0
  24. package/out-tsc/vitest/vite.config.d.ts.map +1 -0
  25. package/package.json +28 -0
  26. package/pkgs/core/dist/index.js +54 -0
  27. package/pkgs/core/dist/pkgs/core/LICENSE.md +660 -0
  28. package/pkgs/core/dist/pkgs/core/README.md +373 -0
  29. package/pkgs/dsl/dist/index.js +123 -0
  30. package/pkgs/dsl/dist/pkgs/dsl/README.md +11 -0
  31. package/project.json +125 -0
  32. package/prompts/architect.md +87 -0
  33. package/prompts/condition.md +33 -0
  34. package/prompts/declarative_sql.md +15 -0
  35. package/prompts/deps_in_payloads.md +20 -0
  36. package/prompts/dsl-multi-arg.ts +48 -0
  37. package/prompts/dsl-options.md +39 -0
  38. package/prompts/dsl-single-arg.ts +51 -0
  39. package/prompts/dsl-two-arg.ts +61 -0
  40. package/prompts/dsl.md +119 -0
  41. package/prompts/fanout_steps.md +1 -0
  42. package/prompts/json_schemas.md +36 -0
  43. package/prompts/one_shot.md +286 -0
  44. package/prompts/pgtap.md +229 -0
  45. package/prompts/sdk.md +59 -0
  46. package/prompts/step_types.md +62 -0
  47. package/prompts/versioning.md +16 -0
  48. package/queries/fail_permanently.sql +17 -0
  49. package/queries/fail_task.sql +21 -0
  50. package/queries/sequential.sql +47 -0
  51. package/queries/two_roots_left_right.sql +59 -0
  52. package/schema.svg +1 -0
  53. package/scripts/colorize-pgtap-output.awk +72 -0
  54. package/scripts/run-test-with-colors +5 -0
  55. package/scripts/watch-test +7 -0
  56. package/src/PgflowSqlClient.ts +85 -0
  57. package/src/database-types.ts +759 -0
  58. package/src/index.ts +3 -0
  59. package/src/types.ts +103 -0
  60. package/supabase/config.toml +32 -0
  61. package/supabase/migrations/000000_schema.sql +150 -0
  62. package/supabase/migrations/000005_create_flow.sql +29 -0
  63. package/supabase/migrations/000010_add_step.sql +48 -0
  64. package/supabase/migrations/000015_start_ready_steps.sql +45 -0
  65. package/supabase/migrations/000020_start_flow.sql +46 -0
  66. package/supabase/migrations/000030_read_with_poll_backport.sql +70 -0
  67. package/supabase/migrations/000040_poll_for_tasks.sql +100 -0
  68. package/supabase/migrations/000045_maybe_complete_run.sql +30 -0
  69. package/supabase/migrations/000050_complete_task.sql +98 -0
  70. package/supabase/migrations/000055_calculate_retry_delay.sql +11 -0
  71. package/supabase/migrations/000060_fail_task.sql +124 -0
  72. package/supabase/migrations/000_edge_worker_initial.sql +86 -0
  73. package/supabase/seed.sql +202 -0
  74. package/supabase/tests/add_step/basic_step_addition.test.sql +29 -0
  75. package/supabase/tests/add_step/circular_dependency.test.sql +21 -0
  76. package/supabase/tests/add_step/flow_isolation.test.sql +26 -0
  77. package/supabase/tests/add_step/idempotent_step_addition.test.sql +20 -0
  78. package/supabase/tests/add_step/invalid_step_slug.test.sql +16 -0
  79. package/supabase/tests/add_step/nonexistent_dependency.test.sql +16 -0
  80. package/supabase/tests/add_step/nonexistent_flow.test.sql +13 -0
  81. package/supabase/tests/add_step/options.test.sql +66 -0
  82. package/supabase/tests/add_step/step_with_dependency.test.sql +36 -0
  83. package/supabase/tests/add_step/step_with_multiple_dependencies.test.sql +46 -0
  84. package/supabase/tests/complete_task/archives_message.test.sql +67 -0
  85. package/supabase/tests/complete_task/completes_run_if_no_more_remaining_steps.test.sql +62 -0
  86. package/supabase/tests/complete_task/completes_task_and_updates_dependents.test.sql +64 -0
  87. package/supabase/tests/complete_task/decrements_remaining_steps_if_completing_step.test.sql +62 -0
  88. package/supabase/tests/complete_task/saves_output_when_completing_run.test.sql +57 -0
  89. package/supabase/tests/create_flow/flow_creation.test.sql +27 -0
  90. package/supabase/tests/create_flow/idempotency_and_duplicates.test.sql +26 -0
  91. package/supabase/tests/create_flow/invalid_slug.test.sql +13 -0
  92. package/supabase/tests/create_flow/options.test.sql +57 -0
  93. package/supabase/tests/fail_task/exponential_backoff.test.sql +70 -0
  94. package/supabase/tests/fail_task/mark_as_failed_if_no_retries_available.test.sql +49 -0
  95. package/supabase/tests/fail_task/respects_flow_retry_settings.test.sql +48 -0
  96. package/supabase/tests/fail_task/respects_step_retry_settings.test.sql +48 -0
  97. package/supabase/tests/fail_task/retry_task_if_retries_available.test.sql +39 -0
  98. package/supabase/tests/is_valid_slug.test.sql +72 -0
  99. package/supabase/tests/poll_for_tasks/builds_proper_input_from_deps_outputs.test.sql +35 -0
  100. package/supabase/tests/poll_for_tasks/hides_messages.test.sql +35 -0
  101. package/supabase/tests/poll_for_tasks/increments_attempts_count.test.sql +35 -0
  102. package/supabase/tests/poll_for_tasks/multiple_task_processing.test.sql +24 -0
  103. package/supabase/tests/poll_for_tasks/polls_only_queued_tasks.test.sql +35 -0
  104. package/supabase/tests/poll_for_tasks/reads_messages.test.sql +38 -0
  105. package/supabase/tests/poll_for_tasks/returns_no_tasks_if_no_step_task_for_message.test.sql +34 -0
  106. package/supabase/tests/poll_for_tasks/returns_no_tasks_if_queue_is_empty.test.sql +19 -0
  107. package/supabase/tests/poll_for_tasks/returns_no_tasks_when_qty_set_to_0.test.sql +22 -0
  108. package/supabase/tests/poll_for_tasks/sets_vt_delay_based_on_opt_timeout.test.sql +41 -0
  109. package/supabase/tests/poll_for_tasks/tasks_reapppear_if_not_processed_in_time.test.sql +59 -0
  110. package/supabase/tests/start_flow/creates_run.test.sql +24 -0
  111. package/supabase/tests/start_flow/creates_step_states_for_all_steps.test.sql +25 -0
  112. package/supabase/tests/start_flow/creates_step_tasks_only_for_root_steps.test.sql +54 -0
  113. package/supabase/tests/start_flow/returns_run.test.sql +24 -0
  114. package/supabase/tests/start_flow/sends_messages_on_the_queue.test.sql +50 -0
  115. package/supabase/tests/start_flow/starts_only_root_steps.test.sql +21 -0
  116. package/supabase/tests/step_dsl_is_idempotent.test.sql +34 -0
  117. package/tsconfig.json +16 -0
  118. package/tsconfig.lib.json +26 -0
  119. package/tsconfig.spec.json +35 -0
  120. package/vite.config.ts +57 -0
@@ -0,0 +1,87 @@
1
+ # Architect Mode
2
+
3
+ ## Your Role
4
+ You are a senior software architect with extensive experience designing scalable, maintainable systems. Your purpose is to thoroughly analyze requirements and design optimal solutions before any implementation begins. You must resist the urge to immediately write code and instead focus on comprehensive planning and architecture design.
5
+
6
+ ## Your Behavior Rules
7
+ - You must thoroughly understand requirements before proposing solutions
8
+ - You must reach 90% confidence in your understanding before suggesting implementation
9
+ - You must identify and resolve ambiguities through targeted questions
10
+ - You must document all assumptions clearly
11
+
12
+ ## Process You Must Follow
13
+
14
+ ### Phase 1: Requirements Analysis
15
+ 1. Carefully read all provided information about the project or feature
16
+ 2. Extract and list all functional requirements explicitly stated
17
+ 3. Identify implied requirements not directly stated
18
+ 4. Determine non-functional requirements including:
19
+ - Performance expectations
20
+ - Security requirements
21
+ - Scalability needs
22
+ - Maintenance considerations
23
+ 5. Ask clarifying questions about any ambiguous requirements
24
+ 6. Report your current understanding confidence (0-100%)
25
+
26
+ ### Phase 2: System Context Examination
27
+ 1. If an existing codebase is available:
28
+ - Request to examine directory structure
29
+ - Ask to review key files and components
30
+ - Identify integration points with the new feature
31
+ 2. Identify all external systems that will interact with this feature
32
+ 3. Define clear system boundaries and responsibilities
33
+ 4. If beneficial, create a high-level system context diagram
34
+ 5. Update your understanding confidence percentage
35
+
36
+ ### Phase 3: Architecture Design
37
+ 1. Propose 2-3 potential architecture patterns that could satisfy requirements
38
+ 2. For each pattern, explain:
39
+ - Why it's appropriate for these requirements
40
+ - Key advantages in this specific context
41
+ - Potential drawbacks or challenges
42
+ 3. Recommend the optimal architecture pattern with justification
43
+ 4. Define core components needed in the solution, with clear responsibilities for each
44
+ 5. Design all necessary interfaces between components
45
+ 6. If applicable, design database schema showing:
46
+ - Entities and their relationships
47
+ - Key fields and data types
48
+ - Indexing strategy
49
+ 7. Address cross-cutting concerns including:
50
+ - Authentication/authorization approach
51
+ - Error handling strategy
52
+ - Logging and monitoring
53
+ - Security considerations
54
+ 8. Update your understanding confidence percentage
55
+
56
+ ### Phase 4: Technical Specification
57
+ 1. Recommend specific technologies for implementation, with justification
58
+ 2. Break down implementation into distinct phases with dependencies
59
+ 3. Identify technical risks and propose mitigation strategies
60
+ 4. Create detailed component specifications including:
61
+ - API contracts
62
+ - Data formats
63
+ - State management
64
+ - Validation rules
65
+ 5. Define technical success criteria for the implementation
66
+ 6. Update your understanding confidence percentage
67
+
68
+ ### Phase 5: Transition Decision
69
+ 1. Summarize your architectural recommendation concisely
70
+ 2. Present implementation roadmap with phases
71
+ 3. State your final confidence level in the solution
72
+ 4. If confidence ≥ 90%:
73
+ - State: "I'm ready to build! Switch to Agent mode and tell me to continue."
74
+ 5. If confidence < 90%:
75
+ - List specific areas requiring clarification
76
+ - Ask targeted questions to resolve remaining uncertainties
77
+ - State: "I need additional information before we start coding."
78
+
79
+ ## Response Format
80
+ Always structure your responses in this order:
81
+ 1. Current phase you're working on
82
+ 2. Findings or deliverables for that phase
83
+ 3. Current confidence percentage
84
+ 4. Questions to resolve ambiguities (if any)
85
+ 5. Next steps
86
+
87
+ Remember: Your primary value is in thorough design that prevents costly implementation mistakes. Take the time to design correctly before suggesting to use Agent mode.
@@ -0,0 +1,33 @@
1
+ # Conditional Steps in the Flow DSL
2
+
3
+ Conditional steps allow steps to run only when certain criteria are met based on the incoming payload. Instead of always executing as soon as their dependencies complete, these steps check the provided condition against the input data.
4
+
5
+ ## How It Works
6
+
7
+ - **Definition**: A condition is supplied as a JSON fragment via the step options (for example, using `runIf` or `runUnless`).
8
+ - **Evaluation**: At runtime, the system evaluates the condition by comparing the step's combined inputs against the JSON fragment.
9
+ - **Mechanism**: Under the hood, the payload is matched against the condition using a JSON containment operator (`@>`), commonly available in PostgreSQL. This operator checks if the input JSON "contains" the condition JSON structure.
10
+ - **Outcome**:
11
+ - If the condition is met (for `runIf`) or not met (for `runUnless`), the step is executed.
12
+ - If the condition fails, the step is marked as skipped, and its downstream dependent steps are not executed (or are similarly marked as skipped).
13
+
14
+ This design helps ensure that unnecessary processing is avoided when prerequisites are not satisfied.
15
+
16
+ ## Type safety
17
+
18
+ Options object can be strictly type-safe and only allow values that are available in the payload,
19
+ so it is impossible to define invalid condition object.
20
+
21
+ ## Marking as skipped
22
+
23
+ Skipped steps are not considered a failure but will propagate skipped status to all dependent steps and
24
+ they will not run.
25
+
26
+ This way we can achieve a kinda robust low level branching logic - users can define branches
27
+ by creating steps with mutually-exclusive conditions, so only one branch will be executed:
28
+
29
+ ```ts
30
+ const ScrapeWebsiteFlow = new Flow<{ input: true }>()
31
+ .step({ slug: 'run_if_true', runIf: { run: { input: true } } }, handler)
32
+ .step({ slug: 'run_if_false', runUnless: { run: { input: true } } }, handler);
33
+ ```
@@ -0,0 +1,15 @@
1
+ ### Declarative vs procedural
2
+
3
+ **YOU MUST ALWAYS PRIORITIZE DECLARATIVE STYLE** and prioritize Batching operations.
4
+
5
+ Avoid plpgsql as much as you can.
6
+ It is important to have your DB procedures run in batched ways and use declarative rather than procedural constructs where possible:
7
+
8
+ - do not ever use `language plplsql` in functions, always use `language sql`
9
+ - don't do loops, do SQL statements that address multiple rows at once.
10
+ - don't write trigger functions that fire for a single row, use `FOR EACH STATEMENT` instead.
11
+ - don't call functions for each row in a result set, a condition, a join, or whatever; instead use functions that return `SETOF` and join against these.
12
+
13
+ If you're constructing dynamic SQL, you should only ever use `%I` and `%L` when using `FORMAT` or similar; you should never see `%s` (with the very rare exception of where you're merging in another SQL fragment that you've previously formatted using %I and %L).
14
+
15
+ Remember, that functions have significant overhead in Postgres - instead of factoring into lots of tiny functions, think about how to make your code more expressive so there's no need.
@@ -0,0 +1,20 @@
1
+ currently the 'input' jsonb that we are building contains only the 'run' input
2
+
3
+ we really want to have this jsonb to contain all the deps outputs
4
+ by dep output i mean a step_tasks.output value that corresponds to step_states row that is a dependency of the given updated_step_tasks (via step_states->steps)
5
+ the step_slug of a given dependency should be used as key and its output as value in the input jsonb
6
+
7
+ so, if a given updated_step_task belongs to step_state, that have 2 dependencies:
8
+
9
+ step_slug=dep_a output=123
10
+ step_slug=dep_b output=456
11
+
12
+ we would like the final 'input' jsonb to look like this:
13
+
14
+ {
15
+ "run": r.input,
16
+ "dep_a": dep_a_step_task.output,
17
+ "dep_b": dep_b_step_task.output
18
+ }
19
+
20
+ write appropriate joins and augment this code with this requirements
@@ -0,0 +1,48 @@
1
+ const ScrapeWebsiteFlow = new Flow<Input>()
2
+ .step('verify_status', async (payload) => {
3
+ // Placeholder function
4
+ return { status: 'success' };
5
+ })
6
+ .step(
7
+ 'when_success',
8
+ ['verify_status'],
9
+ async (payload) => {
10
+ // Placeholder function
11
+ return await scrapeSubpages(
12
+ payload.run.url,
13
+ payload.table_of_contents.urls_of_subpages
14
+ );
15
+ },
16
+ { runIf: { verify_status: { status: 'success' } } }
17
+ )
18
+ .step(
19
+ 'when_server_error',
20
+ ['verify_status'],
21
+ async (payload) => {
22
+ // Placeholder function
23
+ return await generateSummaries(payload.subpages.contentsOfSubpages);
24
+ },
25
+ { runUnless: { verify_status: { status: 'success' } } }
26
+ )
27
+
28
+ .step(
29
+ 'sentiments',
30
+ ['subpages'],
31
+ async (payload) => {
32
+ // Placeholder function
33
+ return await analyzeSentiments(payload.subpages.contentsOfSubpages);
34
+ },
35
+ { maxAttempts: 5, baseDelay: 10 }
36
+ )
37
+ .step(
38
+ 'save_to_db',
39
+ ['subpages', 'summaries', 'sentiments'],
40
+ async (payload) => {
41
+ // Placeholder function
42
+ return await saveToDb(
43
+ payload.subpages,
44
+ payload.summaries,
45
+ payload.sentiments
46
+ );
47
+ }
48
+ );
@@ -0,0 +1,39 @@
1
+ # Flow DSL with options
2
+
3
+ The idea is to add 4th argument to the `.step` method which will be an object
4
+ for the step options:
5
+
6
+ ```ts
7
+ {
8
+ runIf: Json;
9
+ runUnless: Json;
10
+ maxAttempts: number;
11
+ baseDelay: number;
12
+ }
13
+ ```
14
+
15
+ ## Full flow example
16
+
17
+ ```ts
18
+ const ScrapeWebsiteFlow = new Flow<Input>()
19
+ .step('verify_status', async (payload) => {
20
+ // Placeholder function
21
+ return { status: 'success' }
22
+ })
23
+ .step('when_success', ['verify_status'], async (payload) => {
24
+ // Placeholder function
25
+ return await scrapeSubpages(payload.run.url, payload.table_of_contents.urls_of_subpages);
26
+ }, { runIf: { status: 'success' } })
27
+ .step('when_server_error', ['verify_status'], async (payload) => {
28
+ // Placeholder function
29
+ return await generateSummaries(payload.subpages.contentsOfSubpages);
30
+ }, { runUnless: { status: 'success' } })
31
+ .step('sentiments', ['subpages'], async (payload) => {
32
+ // Placeholder function
33
+ return await analyzeSentiments(payload.subpages.contentsOfSubpages);
34
+ }, { maxAttempts: 5, baseDelay: 10 })
35
+ .step('save_to_db', ['subpages', 'summaries', 'sentiments'], async (payload) => {
36
+ // Placeholder function
37
+ return await saveToDb(payload.subpages, payload.summaries, payload.sentiments);
38
+ });
39
+ ```
@@ -0,0 +1,51 @@
1
+ const ScrapeWebsiteFlow = new Flow<Input>()
2
+ .step({
3
+ id: 'verify_status',
4
+ handler: async (payload) => {
5
+ // Placeholder function
6
+ return { status: 'success' };
7
+ },
8
+ })
9
+ .step({
10
+ id: 'when_success',
11
+ deps: ['verify_status'],
12
+ runIf: { verify_status: { status: 'success' } },
13
+ async handler(payload) {
14
+ // Placeholder function
15
+ return await scrapeSubpages(
16
+ payload.run.url,
17
+ payload.table_of_contents.urls_of_subpages
18
+ );
19
+ }
20
+ })
21
+ .step({
22
+ id: 'when_server_error',
23
+ deps: ['verify_status'],
24
+ runUnless: { verify_status: { status: 'success' } },
25
+ async handler(payload) {
26
+ // Placeholder function
27
+ return await generateSummaries(payload.subpages.contentsOfSubpages);
28
+ }
29
+ })
30
+ .step({
31
+ id: 'sentiments',
32
+ deps: ['subpages'],
33
+ async handler(payload) {
34
+ // Placeholder function
35
+ return await analyzeSentiments(payload.subpages.contentsOfSubpages);
36
+ },
37
+ maxAttempts: 5,
38
+ baseDelay: 10
39
+ })
40
+ .step({
41
+ id: 'save_to_db',
42
+ deps: ['subpages', 'summaries', 'sentiments'],
43
+ async handler(payload) {
44
+ // Placeholder function
45
+ return await saveToDb(
46
+ payload.subpages,
47
+ payload.summaries,
48
+ payload.sentiments
49
+ );
50
+ },
51
+ });
@@ -0,0 +1,61 @@
1
+ const ScrapeWebsiteFlow = new Flow<Input>()
2
+ .step(
3
+ {
4
+ slug: 'verify_status',
5
+ },
6
+ async (payload) => {
7
+ // Placeholder function
8
+ return { status: 'success' };
9
+ }
10
+ )
11
+ .step(
12
+ {
13
+ slug: 'when_success',
14
+ dependsOn: ['verify_status'],
15
+ runIf: { verify_status: { status: 'success' } },
16
+ },
17
+ async (payload) => {
18
+ // Placeholder function
19
+ return await scrapeSubpages(
20
+ payload.run.url,
21
+ payload.table_of_contents.urls_of_subpages
22
+ );
23
+ }
24
+ )
25
+ .step(
26
+ {
27
+ slug: 'when_server_error',
28
+ dependsOn: ['verify_status'],
29
+ runUnless: { verify_status: { status: 'success' } },
30
+ },
31
+ async (payload) => {
32
+ // Placeholder function
33
+ return await generateSummaries(payload.subpages.contentsOfSubpages);
34
+ }
35
+ )
36
+ .step(
37
+ {
38
+ slug: 'sentiments',
39
+ dependsOn: ['subpages'],
40
+ maxAttempts: 5,
41
+ baseDelay: 10,
42
+ },
43
+ async (payload) => {
44
+ // Placeholder function
45
+ return await analyzeSentiments(payload.subpages.contentsOfSubpages);
46
+ }
47
+ )
48
+ .step(
49
+ {
50
+ slug: 'save_to_db',
51
+ dependsOn: ['subpages', 'summaries', 'sentiments'],
52
+ },
53
+ async (payload) => {
54
+ // Placeholder function
55
+ return await saveToDb(
56
+ payload.subpages,
57
+ payload.summaries,
58
+ payload.sentiments
59
+ );
60
+ }
61
+ );
package/prompts/dsl.md ADDED
@@ -0,0 +1,119 @@
1
+ # Flow DSL
2
+
3
+ Flow DSL is used do define shape of the flow and tie functions to particular steps.
4
+
5
+ ## Full flow example
6
+
7
+ ```ts
8
+ const ScrapeWebsiteFlow = new Flow<Input>()
9
+ .step('table_of_contents', async (payload) => {
10
+ // Placeholder function
11
+ return await fetchTableOfContents(payload.run.url);
12
+ })
13
+ .step('subpages', ['table_of_contents'], async (payload) => {
14
+ // Placeholder function
15
+ return await scrapeSubpages(payload.run.url, payload.table_of_contents.urls_of_subpages);
16
+ })
17
+ .step('summaries', ['subpages'], async (payload) => {
18
+ // Placeholder function
19
+ return await generateSummaries(payload.subpages.contentsOfSubpages);
20
+ })
21
+ .step('sentiments', ['subpages'], async (payload) => {
22
+ // Placeholder function
23
+ return await analyzeSentiments(payload.subpages.contentsOfSubpages);
24
+ })
25
+ .step('save_to_db', ['subpages', 'summaries', 'sentiments'], async (payload) => {
26
+ // Placeholder function
27
+ return await saveToDb(payload.subpages, payload.summaries, payload.sentiments);
28
+ });
29
+ ```
30
+
31
+ ## Explanation
32
+
33
+ This is Fluent API stype DSL but it is very simple:
34
+
35
+ 1. Users create a flow by initializing a `Flow` object with a mandatory
36
+ type annotation for the Flow `input` - this is the type of the payload
37
+ users would start flow with and must be serializable to Json:
38
+
39
+ ```ts
40
+ type Input = {
41
+ url: string; // url of the website to scrape
42
+ };
43
+
44
+ const ScrapeWebsiteFlow = new Flow<Input>()
45
+ ```
46
+
47
+ 2. Then they define steps by calling `.step(stepSlug: string, depsSlugs: string[], handler: Function)` method.
48
+ The `depsSlugs` array can be ommited if the step has no dependencies.
49
+ This kind of steps are named "root steps" and are run first and passed only the flow input payload:
50
+
51
+ ```ts
52
+ const ScrapeWebsiteFlow = new Flow<Input>()
53
+ .step('table_of_contents', async (payload) => {
54
+ const { run } = payload;
55
+ // do something
56
+ // make sure to return some value so next steps can use it
57
+ return {
58
+ urls_of_subpages,
59
+ title
60
+ }
61
+ })
62
+ ```
63
+
64
+ The `payload` object always have a special key `run` which is value passed as flow input -
65
+ every step can access and use it.
66
+
67
+ What the step handler returns is very important!
68
+ We name it `output` and it will be persisted in the the database
69
+ and used as `input` for the dependent steps.
70
+
71
+ It must be serializable to json.
72
+
73
+ 3. Then they define dependent steps by calling `.step(stepSlug: string, depsSlugs: string[], handler: Function)` method,
74
+ now providing an array of dependencies slugs: `['table_of_contents']`.
75
+
76
+ ```ts
77
+ .step('subpages', ['table_of_contents'], async (payload) => {
78
+ const { run, urls_of_subpages } = payload;
79
+ // do something
80
+ // make sure to return some value so next steps can use it
81
+ return {
82
+ contentsOfSubpages
83
+ }
84
+ })
85
+ ```
86
+
87
+ Notice how the `payload` object got a new key `urls_of_subpages` - each dependency
88
+ results (the persisted return value from handler) will get passed to `payload` under the dependency slug key.
89
+
90
+ ```ts
91
+ {
92
+ run: { url: 'https://example.com' },
93
+ table_of_contents: {
94
+ urls_of_subpages: ['https://example.com/subpage1', 'https://example.com/subpage2']
95
+ }
96
+ }
97
+ ```
98
+
99
+ 4. There can be multiple steps in parallel:
100
+
101
+ ```ts
102
+ .step('summaries', ['subpages'], async (payload) => await doSomeStuff())
103
+ .step('sentiments', ['subpages'], async (payload) => await doSomeStuff())
104
+ ```
105
+
106
+ 5. Steps can also depend on more than one other step:
107
+
108
+ ```ts
109
+ .step('save_to_db', ['subpages', 'summaries', 'sentiments'], async (payload) => await saveToDb())
110
+ ```
111
+
112
+ 6. When run finishes, the `output`s of steps that have no dependents will be combined
113
+ together and saved as the run's `output`. This object will be built in similar
114
+ way as the step `input` object, but will lack the `run` key.
115
+
116
+ 7. Type Safety - all the step payloads types are inferred from the combination
117
+ of Flow input, handler inferred return type and the shape of the graph.
118
+
119
+ So users will always know that type is their step input.
@@ -0,0 +1 @@
1
+ .file prompts/architect.md -- we are in monorepo for pgflow - postgres native workflow engine. we are not in the root, we are in pkgs/core - an sql part. sql code lives in supabase/migrations/ and tests live in supabase/tests/. you can check some info about step types and declarative sql in prompts/ folder. your jobs is to implement next step type - 'fanout_tasks' type, which just enqueues multiple tasks for a single step - one task per input array item. this step type must have a json path parameter that will tell which part of input is an array of items. you must change the code so steps with this type are handled differently in all the functions. use task_index to indicate which array item the given task is for. when compoleting task, do not proceed with completing steps etc. unless all the tasks for given step state are completed, then use task_index to order their outputs and use all those as a step output array. try to understand what needs to be done first. modify existing migrations, do not create new ones - this is unrealeased source code and we can change migrations.
@@ -0,0 +1,36 @@
1
+ # JSON Schemas
2
+
3
+ JSON schemas can be inferred from the steps `input` types,
4
+ so it is relatively easy to build a JSON schema for each step input.
5
+
6
+ The same goes for the JSON Schema for the flow input.
7
+
8
+ ## Schema storage
9
+
10
+ Schemas should be stored in the `pgflow.flows` and `pgflow.steps` tables.
11
+
12
+ ## Schemas in versioning
13
+
14
+ To make sure that slight changes in the input/output types of steps
15
+ trigger a new version of the flow, we need to use the inferred schemas
16
+ when generating a version hash of the flow.
17
+
18
+ ## Schemas as validation
19
+
20
+ We can use schemas to do data validation for step handlers:
21
+
22
+ 1. Task executors can validate the runtime input payloads for handler
23
+ and their output results against the schema.
24
+ 2. Core SQL engine can use `pg_jsonschema` to validate the input values to flows
25
+ and maybe the input values to steps and fail steps if they don't match.
26
+
27
+ ## Problems
28
+
29
+ Doing any JSON Schema validation in database is probably not a good idea because
30
+ of performance impact it would have.
31
+
32
+ Using runtime validation in Task Executors is probably good enough,
33
+ with exception of validating the Flow input - you start flows less often than
34
+ steps and it seems like a good idea to validate the input database-wise.
35
+
36
+