npm - @pgflow/dsl - Versions diffs - 0.0.5-prealpha.2 → 0.0.6 - Mend

@pgflow/dsl 0.0.5-prealpha.2 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/package.json +4 -1
package/__tests__/runtime/flow.test.ts +0 -121
package/__tests__/runtime/steps.test.ts +0 -183
package/__tests__/runtime/utils.test.ts +0 -149
package/__tests__/types/dsl-types.test-d.ts +0 -103
package/__tests__/types/example-flow.test-d.ts +0 -76
package/__tests__/types/extract-flow-input.test-d.ts +0 -71
package/__tests__/types/extract-flow-steps.test-d.ts +0 -74
package/__tests__/types/getStepDefinition.test-d.ts +0 -65
package/__tests__/types/step-input.test-d.ts +0 -212
package/__tests__/types/step-output.test-d.ts +0 -55
package/brainstorming/condition/condition-alternatives.md +0 -219
package/brainstorming/condition/condition-with-flexibility.md +0 -303
package/brainstorming/condition/condition.md +0 -139
package/brainstorming/condition/implementation-plan.md +0 -372
package/brainstorming/dsl/cli-json-schema.md +0 -225
package/brainstorming/dsl/cli.md +0 -179
package/brainstorming/dsl/create-compilator.md +0 -25
package/brainstorming/dsl/dsl-analysis-2.md +0 -166
package/brainstorming/dsl/dsl-analysis.md +0 -512
package/brainstorming/dsl/dsl-critique.md +0 -41
package/brainstorming/fanouts/fanout-subflows-flattened-vs-subruns.md +0 -213
package/brainstorming/fanouts/fanouts-task-index.md +0 -150
package/brainstorming/fanouts/fanouts-with-conditions-and-subflows.md +0 -239
package/brainstorming/subflows/branching.ts.md +0 -38
package/brainstorming/subflows/subflows-callbacks.ts.md +0 -124
package/brainstorming/subflows/subflows-classes.ts.md +0 -83
package/brainstorming/subflows/subflows-flattening-versioned.md +0 -119
package/brainstorming/subflows/subflows-flattening.md +0 -138
package/brainstorming/subflows/subflows.md +0 -118
package/brainstorming/subflows/subruns-table.md +0 -282
package/brainstorming/subflows/subruns.md +0 -315
package/brainstorming/versioning/breaking-and-non-breaking-flow-changes.md +0 -259
package/docs/refactor-edge-worker.md +0 -146
package/docs/versioning.md +0 -19
package/eslint.config.cjs +0 -22
package/out-tsc/vitest/__tests__/runtime/flow.test.d.ts +0 -2
package/out-tsc/vitest/__tests__/runtime/flow.test.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/runtime/steps.test.d.ts +0 -2
package/out-tsc/vitest/__tests__/runtime/steps.test.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/runtime/utils.test.d.ts +0 -2
package/out-tsc/vitest/__tests__/runtime/utils.test.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/dsl-types.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/dsl-types.test-d.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/example-flow.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/example-flow.test-d.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/extract-flow-input.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/extract-flow-input.test-d.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/extract-flow-steps.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/extract-flow-steps.test-d.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/getStepDefinition.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/getStepDefinition.test-d.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/step-input.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/step-input.test-d.d.ts.map +0 -1
package/out-tsc/vitest/__tests__/types/step-output.test-d.d.ts +0 -2
package/out-tsc/vitest/__tests__/types/step-output.test-d.d.ts.map +0 -1
package/out-tsc/vitest/tsconfig.spec.tsbuildinfo +0 -1
package/out-tsc/vitest/vite.config.d.ts +0 -3
package/out-tsc/vitest/vite.config.d.ts.map +0 -1
package/project.json +0 -28
package/prompts/edge-worker-refactor.md +0 -105
package/src/dsl.ts +0 -318
package/src/example-flow.ts +0 -67
package/src/index.ts +0 -1
package/src/utils.ts +0 -84
package/tsconfig.json +0 -13
package/tsconfig.lib.json +0 -26
package/tsconfig.spec.json +0 -35
package/typecheck.log +0 -120
package/vite.config.ts +0 -57

package/brainstorming/dsl/cli.md DELETED Viewed

@@ -1,179 +0,0 @@
-# Brainstorm: Converting a TypeScript Flow DSL into pgflow Definitions
-This document explores various approaches for **translating a TypeScript Flow DSL** (effectively a typed object graph) directly into SQL statements that register flows in **pgflow** via `create_flow` and `add_step`. We also discuss how to manage these flows in development and production, respecting **immutable** flow definitions, versioning via `flow_slug`, and ensuring an **exceptional developer experience**. Finally, we’ll introduce some new ideas and best practices inspired by other tools.
-## Why We Need a Flow DSL → SQL Compilation Step
-1. **Single Source of Truth**: The TypeScript DSL is a more developer-friendly way to define flows (with auto-complete, type inference, etc.). However, pgflow requires the flow definition to be present in the database to manage steps, dependencies, and runs.
-2. **Consistency**: We minimize manual steps (writing raw SQL) when we can automate it. This ensures that the flow structure in code stays in sync with what’s actually in the database.
-3. **Safety & Auditing**: Flows are **immutable** in production to avoid “half-upgraded” scenarios. We need a reliable process for introducing new flows or updated flows (via new slugs) and ensuring old ones remain intact if they’re still used.
-## Summary of Key Requirements
-- Take the DSL object (with steps, dependencies, timeouts, etc.) and generate:
-  - SQL queries calling `pgflow.create_flow(slug, ...)`
-  - SQL queries for each step calling `pgflow.add_step(...)`, in topological order.
-- Provide a **development** workflow that is fast to iterate on. Possibly auto-recreate the flow in the DB on every code change.
-- Provide a **production** workflow that is safe and auditable. Possibly generate a migration script that can be run in CI/CD pipelines.
-- If a flow with the same slug but different shape is encountered, we must throw an error (since flows are immutable and can’t be replaced in production).
-- If we do re-register the same slug with the same shape, no updates are needed (safe no-op).
-- Because flows are immutable, changes to shape require a new `flow_slug`.
-## Potential Approaches
-### 1. pgflow CLI Tool
-A dedicated `pgflow` CLI could be responsible for:
-- **“Deploying” a Flow**:
-  - Reads the TypeScript flow definitions (compiled or at runtime).
-  - Converts them into SQL statements.
-  - Executes the statements against the specified database (development environment).
-- **“Compiling” a Flow**:
-  - Converts the TypeScript flow definition into raw SQL (or multiple .sql files).
-  - Writes these files to a `migrations/` directory for deployment in production.
-- **Version Checking**:
-  - If it detects the same `flow_slug` in the DB that differs from the code, it fails with a clear error (“Flow shape mismatch!”).
-  - If it’s truly identical (no changes), it does nothing.
-  - If it’s new, it proceeds to create the references.
-#### Pros
-- Straightforward user experience (just run `pgflow deploy` or `pgflow compile`).
-- Clear separation of concerns: code → DSL → SQL → database.
-- Allows ephemeral recreation in development or safer migrations in production.
-#### Cons
-- Might require additional tooling or configuration to integrate into existing build/deployment pipelines.
-- Must maintain the DSL → SQL translator logic in the CLI.
-### 2. Edge Worker Auto-Check & Registration
-In this approach, the Edge Worker, upon startup or flow usage, does the following:
-- Checks if the given `flow_slug` is already registered and if the shape matches.
-- If not, it attempts to create it (in development).
-- If shape mismatch is found in production, it throws a fatal error to prevent usage.
-#### Pros
-- Zero extra steps for developers (the system just “does the right thing”).
-- Minimizes friction or forgetting to deploy flows.
-#### Cons
-- Potentially tricky to manage safe versioning in production (accidental shape change could break the environment).
-- Could lead to unexpected changes or overwritten flows if not carefully locked down.
-- Harder to integrate with staging/production pipelines that require explicit migrations.
-### 3. Hybrid Approach
-Use a combination of CLI tooling and an Edge Worker check:
-- **CLI** for local dev:
-  - `pgflow dev deploy --force` can recreate the flow on each code change, dropping existing definitions as needed.
-  - Acceptable in dev because losing run state is less critical.
-- **CLI** for production migrations:
-  - Instead of auto-executing, it writes `.sql` files that must be manually or automatically applied by a migration system.
-  - Reinforces the idea that “once in production, flows are immutable.”
-- **Edge Worker**:
-  - Optionally can do a final shape check to confirm that dev or staging flows have been properly migrated. If a mismatch is found, throw an error to avoid partial updates.
-This approach covers all bases: it’s frictionless in dev and strict in production.
-## Immutable Flow Definitions & Versioning
-Here’s a recap and deeper explanation of why flows are **immutable** in pgflow:
-1. **Simplicity**: Maintaining multiple versions simultaneously might create confusion about which version is “official” or “latest.”
-2. **Safety**: Changing a flow mid-run can cause partial upgrades. By “freezing” them, you guarantee a stable environment for ongoing runs.
-3. **Intentional Versioning**: If a flow’s shape changes, you create a new `flow_slug`. For example:
-   - `analyze_website_v1` → initial version.
-   - `analyze_website_v2` → new shape, separate definition.
-While optional aliases to represent the “latest” version can be useful, we recommend making it an explicit user-land concept, not a built-in feature. This ensures that every environment references explicit version slugs.
-## Development vs. Production Strategies
-### Development (Auto-Update)
-- **Auto-drop & recreate**: On every run, the system checks if the flow slug exists. If it does, drop the flow definition (and any partial run state) and recreate it fresh.
-  - Advantage: Instant reflection of code changes.
-  - Disadvantage: You lose state from prior runs. But for dev, that’s often acceptable.
-- **Alternative**: Use a randomness-based slug or incremental suffix in dev, so each new code iteration has a new slug (e.g., `flow_slug_dev_20231012_1`). This preserves old runs at the cost of clutter.
-### Production (Migration & Strictness)
-- **SQL Migration**:
-  - On code commit, run a command like `pgflow compile my_flow.ts --out migrations/2023-10-01_create_analyze_website.sql`.
-  - This file contains:
-    ```sql
-    SELECT pgflow.create_flow('analyze_website', ...);
-    SELECT pgflow.add_step('analyze_website', 'website', ...);
-    ...
-    ```
-  - Then your usual migration system applies this script once. If the slug already exists but definitions differ, the migration fails. Ops can step in to handle that.
-- **No Migration**: If your flows are brand new with brand new slugs, you just add them. If you need to retire old flows, do so manually, or let them remain unchanged.
-## Potential New Ideas
-1. **Flow “Insight” Command**:
-   - A cli sub-command that prints a summary or “manifest” of the entire DSL — steps, dependencies, types, etc.
-   - Helps devs see the shape of the flow quickly or compare two versions at a glance.
-2. **Checksum-based Upsert**:
-   - The CLI or Edge Worker could compute a content-based checksum of the DSL shape. If a flow with the same slug but a *different* checksum is in the DB, it refuses to proceed. If the checksums match, it’s a no-op.
-   - This ensures no accidental mismatch or partial updates.
-3. **Local “Flow Playground”**:
-   - A local web interface that visually shows your flow’s DAG from the DSL, letting you step through nodes or edit them.
-   - Beneath the hood, it calls the same DSL → SQL logic for clarity.
-## High-Level Example: CLI Flow
-Below is a hypothetical example flow from code to deploy:
-1. **Write a Flow** in TypeScript:
-   ```ts
-   const AnalyzeWebsite = new Flow<Input>({
-     slug: "analyze_website_v2",
-     ...
-   })
-     .step({ slug: "website" }, async (input) => { ... })
-     .step({ slug: "sentiment", dependsOn: ["website"] }, async (input) => { ... })
-     .step({ slug: "summary", dependsOn: ["website"] }, async (input) => { ... })
-     .step({ slug: "saveToDb", dependsOn: ["sentiment", "summary"] }, async (input) => { ... });
-   ```
-2. **Compile**: Run:
-   ```
-   $ pgflow compile --file=flows/analyze_website_v2.ts --output=migrations/2023-10-01_analyze_website_v2.sql
-   ```
-   It generates SQL:
-   ```sql
-   SELECT pgflow.create_flow('analyze_website_v2', ...);
-   SELECT pgflow.add_step('analyze_website_v2','website', ...);
-   SELECT pgflow.add_step('analyze_website_v2','sentiment',..., deps => ARRAY['website']);
-   SELECT pgflow.add_step('analyze_website_v2','summary',..., deps => ARRAY['website']);
-   SELECT pgflow.add_step('analyze_website_v2','saveToDb',..., deps => ARRAY['sentiment','summary']);
-   ```
-3. **Deploy** in Development:
-   ```
-   $ pgflow deploy --dev migrations/2023-10-01_analyze_website_v2.sql
-   ```
-   - Optionally it can recreate the flow if it’s changed or new.
-4. **Run** the same file in Production:
-   - Typically through your standard migration pipeline (Alembic, Flyway, etc.).
-   - If the flow slug is found and definitions mismatch, the migration fails, requiring manual action.
-## Conclusion
-The overarching goal is to create an **MVP** that is simple enough for everyday users but can scale to rigorous enterprise demands. By offering a **pgflow CLI** or an **auto-registration** approach, we can cater to different workflows:
-- **Dev**: Fast, ephemeral registration, possibly auto-dropping old definitions.
-- **Prod**: Strict, immutability-enforced approach via migrations, with versioning handled through unique slugs.
-### Key Takeaways
-- **Immutable flows**: If the shape changes, you create a new slug (e.g., “flow_v2”). Older runs remain intact.
-- **Choice of deployment**:
-  - *CLI-based* approach is explicit and suits mature pipelines.
-  - *Edge Worker* approach is more dynamic but risks environment mismatches if not carefully controlled.
-- **Focus on exceptional DX**: Provide a streamlined developer experience with a one-liner to define, refine, and re-deploy flows, plus a frictionless path for production migration.
-- **Stay flexible**: Expose simple primitives (DSL → SQL), so advanced teams can incorporate them into their own workflows, while new teams can rely on curated commands like `pgflow compile` or `pgflow deploy`.
-With these foundations in place, **pgflow** can truly stand out as a robust, developer-friendly, and production-safe workflow orchestration framework built entirely in PostgreSQL.

package/brainstorming/dsl/create-compilator.md DELETED Viewed

@@ -1,25 +0,0 @@
-ai -f src/ -f ../core/README.md -c '
-create FlowCompilator class with `constructor(flow: Flow)`
-this class should have one method `compile(): string` that should return
-an sql definition corresponding to the dsl version of a flow, based on how
-it is described in the README.md of core project:
-it must output a proper sql code with call to create_flow and multiple calls to add_step in the same order as they are added in the flow
-make sure function covers happy path and edge cases:
-- no provided options at all
-- only flow options
-- flow options and step options
-- no flow options, only step options
-- multiple root steps
-- invalid, non-existing options (for flow and for step)
-- invalid value for option (string or something other than number, for flow and for step)
-make it as simple as readable as possible, extract StepCompilator class in the same file to help with compiling respective steps
-make sure to use FlowOptions and StepOptions instead of just Record<string, any> for options
-output typescript
-' | tee src/compile.ts

package/brainstorming/dsl/dsl-analysis-2.md DELETED Viewed

@@ -1,166 +0,0 @@
-# Updated Analysis and Perspective
-Below is a revised critique and commentary, incorporating your recent clarifications and feedback. I’ll address each point (1.1 through 1.9) with an honest, realistic view grounded in the broader context of a database-centric, TypeScript-DSL-driven workflow engine.
----
-## 1.1 Conditional Flow Readability in the Flow DSL
-### Your New Syntax Proposal
-You introduced a syntax that uses `.branch(...)` with `runIf` or `runUnless`, for example:
-```ts
-new Flow<string>({ slug: 'analyzeWebsite' })
-  .step({ slug: 'website' }, async ({ run }) => await fetchData(run.url))
-  .branch(
-    {
-      slug: 'ifSuccess',
-      dependsOn: ['website'],
-      runIf: { website: { status: 200 } },
-    },
-    (flow) =>
-      flow
-        .step({ slug: 'sentiment' }, async ({ run, website }) => /* ... */)
-        .step({ slug: 'summary' }, async ({ run, website }) => /* ... */)
-        // ...
-  )
-  .branch(
-    {
-      slug: 'ifFailure',
-      dependsOn: ['website'],
-      runUnless: { website: { status: 200 } },
-    },
-    (flow) => flow.step({ slug: 'notifySentry' })
-  );
-```
-**Perspective**
-- This more explicit `.branch()` approach might help keep complex conditionals separate from linear or parallel steps. It visually distinguishes conditional blocks from “straight line” steps.
-- The partition into `.branch()` calls can indeed improve readability. Each branch can have a purposeful label like `ifSuccess` or `ifFailure`.
-- There is still the underlying risk that if your condition checks are complicated, you may wind up with multiple nested branches. But the syntax is a step in the right direction—it’s relatively clear which branch executes under which condition.
-**Critical Note**
-- Ensure your team documents (or code-lints) how you want to handle edge cases, like undefined or partial outputs from upstream steps. A typed DSL helps, but corner cases might still arise if, for instance, `website.status` isn’t exactly `200` but is `undefined`.
-- Overall, `.branch()` does not magically solve all “runIf / runUnless” confusion, but it provides a framework that is more visibly structured, which is good for maintainability.
----
-## 1.2 Transaction Usage Only for State Updates
-### Your Clarification
-You stated that transactions are only used for updating the workflow graph status (like “starting a flow,” “completing a task,” or “failing a task”). The actual, potentially long-running work is delegated to a separate task queue worker. That worker does `poll_for_tasks()` and calls `complete_task()` or `fail_task()` outside any lengthy transaction.
-**Perspective**
-- This is a sound approach. The original worry was that you could end up with long DB transactions blocking rows. But you’ve clarified that the heavy-lifting portion (e.g., fetch data, run ML) happens outside of the transactional boundary.
-- This design is effectively “synchronous to the DB only for small window updates,” so you avoid big performance pitfalls in Postgres.
-- As long as you carefully handle intermittent failures or worker restarts, it should scale nicely.
----
-## 1.3 Strongly-Typed DSL vs. Raw SQL
-### Your Clarification
-You emphasize that the TypeScript DSL is the main interface, and it is very strongly typed to prevent cycles, bad dependencies, or invalid payload references. The SQL is only the underlying store.
-**Perspective**
-- Having a single TypeScript DSL layer that strongly enforces correctness is a major plus. It mitigates the risk of manual mistakes when defining steps, especially around accidental cycles or referencing non-existent steps.
-- This addresses the earlier concern about “Incorrect or Missing Step Ordering” or “SQL drift.” If your DSL’s code generation (or direct usage) is the only path to define flows in the DB, it drastically reduces risk.
-- The critical piece is to ensure the DSL actually gates all writes to the underlying DB. If a developer bypasses the DSL and manually edits SQL definitions, you still risk partial drift. In practice, many teams lock down direct DB access so the DSL is the “single source of truth.”
----
-## 1.4 Handling Large Outputs with a Blob Reference System
-### Your Clarification
-You plan a “Blob reference system” where large step outputs are stored separately, referenced by an ID, and not embedded in the main JSON fields.
-**Perspective**
-- This directly tackles data-bloating worries. Storing massive JSON outputs inline can be detrimental to performance and disk usage.
-- By storing references in the normal flow record, you keep the critical orchestration metadata small. The actual large data can be offloaded (either to a separate table or even an object store).
-- This approach makes queries on typical step states remain lean and avoids over-fetching huge data you might only need occasionally.
-**Future Considerations**
-- Introduce automatic TTL or archiving for older blob references. Over time, you may want to clean up or move them to cheaper storage.
-- Provide a concise “downloadBlob(id)” or “getBlob(id)” helper in your DSL so that from a developer’s standpoint, it’s all transparent.
----
-## 1.5 Immutable Flow Definitions & Versioning
-### Your Clarification
-You decided on flow definitions being immutable once deployed, and you use flow slugs rather than version numbers. “If you want a new version, create a new slug.”
-**Perspective**
-- This approach prevents a lot of “in-flight mismatch” errors. In a system like Temporal, they also endorse the idea of pinned, immutable workflow definitions.
-- If you do have old runs referencing older flow code, that’s fine—just keep that old slug around. New runs move to the new slug.
-- The caution is that you might accumulate many old slugs over time. Usually, you do a small workflow or database cleanup step for those that are no longer active.
----
-## 1.6 Immutability Alleviates “DB Drift”
-### Connection to 1.5
-You noted that thanks to immutable definitions, confusion about “latest” or incremental partial upgrades is largely avoided.
-**Perspective**
-- Indeed, forcibly using unique slugs for each new definition clarifies which code belongs to which runs.
-- This does require discipline: you can’t just casually rename or repurpose the same slug. But that discipline is usually beneficial in production scenarios.
-- With a strongly typed DSL that demands explicit new slugs, you reduce the risk of “accidental partial migrations.”
----
-## 1.7 Single-Step Shortcut for Minimal Logic
-### Your Clarification
-You can create a single step that does multiple tasks if you are worried about the overhead of many small steps.
-**Perspective**
-- Sometimes workflows become too granular, where each micro-step is in the DB. If that overhead feels too high, combining multiple consecutive actions in one step is valid.
-- This can make some runs simpler, but you lose some fine-grained visibility or partial retry capability. Decide case-by-case whether each sub-operation should truly be a distinct step (with potential concurrency or separate error handling), or if you’re comfortable bundling them.
----
-## 1.8 Worker Failure Handling and Debugging
-### Your Clarification
-When a handler throws and there are no retries left, the worker calls `fail_task()` with the error, storing that info in `step_tasks`.
-**Perspective**
-- Storing the error message and stack traces (if feasible) is really helpful for debugging. You can see exactly which step crashed and why.
-- For advanced debugging, you might still rely on logs or external systems to see the “in-progress” states. But at least you have a final resting place in the DB that references the error.
-- This addresses the concern of partial data vanish. You do have enough info to retrospectively figure out what went wrong.
----
-## 1.9 Avoiding Secrets in Flow Inputs
-### Your Clarification
-You plan to pass sensitive secrets via an `env` or `context` object provided by the worker, rather than embedding them in the flow’s JSON.
-**Perspective**
-- This is a best practice. Storing secrets or tokens in the DB can be risky—even if it’s encrypted, you want to minimize how widely those secrets are exposed.
-- The environment approach is common in serverless or queue-driven architectures. Each process gets the secrets from a secure source and uses them at runtime.
-- The main watchout: ensure no step inadvertently returns these secrets in the step output, or logs them. The strongly typed DSL helps you avoid returning the “env” object as a result.
----
-## Overall Conclusion
-Your clarifications address many of the original concerns:
-- **Branching**: The `.branch()` approach can indeed improve readability for conditional flows, though you still need discipline to manage complexities.
-- **Transactions**: Limiting transactions to state-updates only is wise, preventing locking issues.
-- **DSL vs. SQL**: The strongly typed TypeScript layer, plus immutable definitions, not only avoids cycles or ordering issues but also simplifies versioning.
-- **Large Data**: A blob reference system will keep your main tables cleaner and more performant.
-- **Secrets & Security**: Passing secrets around using an environment context helps avoid embedding them in the DB.
-From a critical but honest standpoint, the system is shaping up to be quite robust—provided that teams adhere to best practices (avoiding direct SQL patches, using unique slugs for versioning, and carefully scoping branching logic). You’ve set a good foundation, especially by focusing on a clear “flow definition vs. flow run” distinction, frictionless concurrency, and a typed DSL that keeps everything consistent.