pyworkflowy 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,31 @@
1
+ # Byte-compiled / cache
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ .pytest_cache/
6
+ .mypy_cache/
7
+ .ruff_cache/
8
+ .coverage
9
+ .coverage.*
10
+ htmlcov/
11
+ coverage.xml
12
+
13
+ # Build artifacts
14
+ build/
15
+ dist/
16
+ *.egg-info/
17
+ *.egg
18
+
19
+ # Virtual envs
20
+ .venv/
21
+ venv/
22
+ env/
23
+
24
+ # IDE
25
+ .idea/
26
+ .vscode/
27
+ *.swp
28
+
29
+ # OS
30
+ .DS_Store
31
+ Thumbs.db
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Kilian Senger
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,348 @@
1
+ Metadata-Version: 2.4
2
+ Name: pyworkflowy
3
+ Version: 0.1.0
4
+ Summary: A full workflow engine for async/parallelized Python tasks.
5
+ Project-URL: Homepage, https://github.com/kiliansen/pyWorkflowy
6
+ Project-URL: Issues, https://github.com/kiliansen/pyWorkflowy/issues
7
+ Author: KilianSen
8
+ License: MIT License
9
+
10
+ Copyright (c) 2026 Kilian Senger
11
+
12
+ Permission is hereby granted, free of charge, to any person obtaining a copy
13
+ of this software and associated documentation files (the "Software"), to deal
14
+ in the Software without restriction, including without limitation the rights
15
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
16
+ copies of the Software, and to permit persons to whom the Software is
17
+ furnished to do so, subject to the following conditions:
18
+
19
+ The above copyright notice and this permission notice shall be included in all
20
+ copies or substantial portions of the Software.
21
+
22
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
23
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
24
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
25
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
26
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
27
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
28
+ SOFTWARE.
29
+ License-File: LICENSE
30
+ Keywords: asyncio,concurrency,dag,parallel,scheduler,tasks,workflow
31
+ Classifier: Development Status :: 3 - Alpha
32
+ Classifier: Intended Audience :: Developers
33
+ Classifier: License :: OSI Approved :: MIT License
34
+ Classifier: Programming Language :: Python :: 3
35
+ Classifier: Programming Language :: Python :: 3.11
36
+ Classifier: Programming Language :: Python :: 3.12
37
+ Classifier: Programming Language :: Python :: 3.13
38
+ Classifier: Programming Language :: Python :: 3.14
39
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
40
+ Classifier: Typing :: Typed
41
+ Requires-Python: >=3.11
42
+ Provides-Extra: cron
43
+ Description-Content-Type: text/markdown
44
+
45
+ # pyWorkflowy
46
+
47
+ A full workflow engine for async/parallelized Python tasks. Tasks, DAGs, retries, timeouts, three execution backends, persistence/resume, and a cron-like scheduler — all in one library with zero runtime dependencies.
48
+
49
+ ## Install
50
+
51
+ ```bash
52
+ pip install pyworkflowy
53
+ ```
54
+ or
55
+ ```bash
56
+ uv add pyworkflowy
57
+ ```
58
+
59
+ Core has no runtime dependencies. Process-pool support, threading, and asyncio are all stdlib.
60
+
61
+ ## Tasks
62
+
63
+ ### Decorator form
64
+
65
+ `@task` is the simplest entry point. Bare or parameterised, exactly like `@hook` in pyHooky:
66
+
67
+ ```python
68
+ from pyworkflowy import task, TaskRunner
69
+
70
+ @task
71
+ def square(x: int) -> int:
72
+ return x * x
73
+
74
+ @task(name="checkout", retries=3, timeout=10.0, backend="thread")
75
+ def checkout(cart_id: int) -> dict:
76
+ ...
77
+ ```
78
+
79
+ The wrapped object is a `Task` instance. Use `.submit(...)` to enqueue it on the active runner; calling it directly bypasses the runner (useful in unit tests):
80
+
81
+ ```python
82
+ with TaskRunner() as runner:
83
+ handle = square.submit(5)
84
+ runner.run()
85
+ assert handle.result() == 25
86
+
87
+ # direct call — no runner involved
88
+ assert square(5) == 25
89
+ ```
90
+
91
+ The task name is auto-derived from `module.qualname` if you don't pass one. Lambdas and other anonymous functions get an `id()`-suffixed name so multiple ones in the same scope don't collide.
92
+
93
+ ### Class form
94
+
95
+ When configuration-as-class reads better than configuration-as-kwargs, subclass `TaskBase`:
96
+
97
+ ```python
98
+ from pyworkflowy import TaskBase
99
+
100
+ class FetchUser(TaskBase):
101
+ name = "fetch-user"
102
+ backend = "thread"
103
+ retries = 2
104
+ timeout = 5.0
105
+
106
+ def run(self, user_id: int) -> dict[str, Any]:
107
+ return http.get(f"/users/{user_id}").json()
108
+
109
+ fetch_user = FetchUser()
110
+ handle = fetch_user.submit(42)
111
+ ```
112
+
113
+ Instantiating `FetchUser()` returns a `Task` — the same type the decorator produces. `run` may be `def` or `async def`; pyWorkflowy auto-detects.
114
+
115
+ > `TaskBase` constructors take no arguments — class-level attributes configure the *task*; runtime values are passed to `submit(*args, **kwargs)` and forwarded to `run`.
116
+
117
+ ### `max_attempts` vs `retries`
118
+
119
+ `max_attempts=3` means "up to 3 attempts including the first" — sugar for `retries=2`. Pass whichever framing reads better; passing both raises `ValueError`.
120
+
121
+ ## Backends
122
+
123
+ | Backend | Constraint | Cancellation |
124
+ |-------------|---------------------------------------------|----------------------------|
125
+ | `asyncio` | sync or async tasks | cooperative (CancelledError) |
126
+ | `thread` | sync tasks only | cooperative (cancel flag) |
127
+ | `process` | sync tasks; top-level/picklable functions | best-effort (future.cancel) |
128
+
129
+ ```python
130
+ @task(backend="thread")
131
+ def cpu_bound(x): ...
132
+
133
+ @task(backend="process")
134
+ def heavy(x): ...
135
+ ```
136
+
137
+ The runner's `backend=` is the default for tasks that don't override it. The asyncio loop always orchestrates — `runner.run()` is just `asyncio.run(runner.arun())` — so picking `thread`/`process` as the runner default just changes the default for plain-callable submissions.
138
+
139
+ > **Async tasks on non-asyncio backends are rejected** at decoration time. Async needs the event loop; the thread/process pools can't run coroutines without one.
140
+
141
+ ## DAG / dependencies
142
+
143
+ Pass `depends_on=[other_handle, ...]` when submitting. The runner topologically orders execution and gates each task on its deps reaching `COMPLETED`:
144
+
145
+ ```python
146
+ with TaskRunner() as runner:
147
+ h_load = load_csv.submit("data.csv")
148
+ h_clean = clean_rows.submit(depends_on=[h_load])
149
+ h_write = write_db.submit(depends_on=[h_clean])
150
+ runner.run()
151
+ ```
152
+
153
+ Cycles are detected eagerly: `runner.submit(t, depends_on=[h])` raises `CycleError` immediately if adding the edge would close a cycle.
154
+
155
+ ### On-dependency-failure policies
156
+
157
+ Per-task — set via `@task(on_dep_failure=...)`:
158
+
159
+ | Policy | Behaviour when a dep ends non-`COMPLETED` |
160
+ |---------------|--------------------------------------------------------------------|
161
+ | `"fail"` | This task is marked `FAILED` with a `DependencyFailedError`. *(default)* |
162
+ | `"skip"` | This task is marked `SKIPPED`; downstream sees the skip too. |
163
+ | `"run-anyway"`| Task runs as if the dep had succeeded. Args you passed are used as-is. |
164
+
165
+ > Dependencies don't auto-thread their return values into the dependent task's args. If task B needs A's output, look it up after submit via `h_a.result()` *inside* B's body, or pass the value through your own closure.
166
+
167
+ ## Retries / timeouts / cancellation
168
+
169
+ ```python
170
+ @task(retries=3, backoff="exponential", backoff_base=1.0, backoff_max=30.0,
171
+ retry_on=(TransientError,), timeout=15.0)
172
+ def fetch(url): ...
173
+ ```
174
+
175
+ | Knob | Effect |
176
+ |-----------------|-----------------------------------------------------------------------|
177
+ | `retries=N` | Up to N additional attempts after the initial one (`max_attempts=N+1`). |
178
+ | `retry_on` | Exception class or tuple. Only matches are retried; others fail immediately. Default: `Exception`. |
179
+ | `backoff` | `"none"`, `"linear"`, or `"exponential"`. Delay is `base * attempt` (linear) or `base * 2^(attempt-1)` (exponential), capped at `backoff_max`. |
180
+ | `timeout` | Seconds per attempt. Exceeding raises `TaskTimeoutError` — *not* retried (timeouts are terminal). |
181
+
182
+ Cancellation is per-handle: `handle.cancel()` requests stop. Cooperative for asyncio (CancelledError at the next await), cooperative for threads (your body must check `current_task().cancel_event`), best-effort for processes (the future is cancelled if not yet scheduled).
183
+
184
+ ```python
185
+ from pyworkflowy import current_task
186
+
187
+ @task(backend="thread")
188
+ def long_loop(n):
189
+ ctx = current_task()
190
+ for i in range(n):
191
+ if ctx.cancel_event.is_set():
192
+ return "stopped early"
193
+ do_chunk(i)
194
+ ```
195
+
196
+ `runner.cancel_all()` sets the cancel flag on every non-terminal handle.
197
+
198
+ ## Runner
199
+
200
+ ```python
201
+ runner = TaskRunner(
202
+ max_workers=8,
203
+ backend="asyncio", # default for tasks that don't specify
204
+ on_task_error="raise", # "raise" | "log" | "continue"
205
+ checkpoint_path="state.json",
206
+ checkpoint_interval=5.0,
207
+ )
208
+ ```
209
+
210
+ `on_task_error` chooses how task failures propagate out of `run()`:
211
+
212
+ | Value | Behaviour |
213
+ |------------|-------------------------------------------------------------------|
214
+ | `"raise"` | The first failing task's exception aborts the runner. *(default)* |
215
+ | `"log"` | Failures logged via `logging.getLogger("pyworkflowy")`; run continues. |
216
+ | `"continue"` | Failures stored on handles; no log, no raise. |
217
+
218
+ Use as a context manager so the executor pools are torn down cleanly:
219
+
220
+ ```python
221
+ with TaskRunner() as runner:
222
+ ...
223
+ runner.run()
224
+ ```
225
+
226
+ `with TaskRunner(...)` also binds the runner to a contextvar, so `task.submit(...)` finds it without explicit `runner=`. Outside the `with`, pass `runner=` or call `runner.submit(task, ...)` directly.
227
+
228
+ ## Persistence / resume
229
+
230
+ Tell the runner where to write its state:
231
+
232
+ ```python
233
+ with TaskRunner(checkpoint_path="state.json", checkpoint_interval=5.0) as runner:
234
+ ...
235
+ runner.run()
236
+ ```
237
+
238
+ The default `JSONCheckpointer` writes after each task completion (rate-limited by `checkpoint_interval`). To resume after a crash, call `TaskRunner.resume`:
239
+
240
+ ```python
241
+ runner = TaskRunner.resume("state.json")
242
+ # Re-submit the same tasks; previously-completed ones get their results
243
+ # primed and won't re-execute.
244
+ ```
245
+
246
+ Resume uses each handle's persisted ID — so the runner needs to be re-built with the same submission order (or you have to inject IDs manually for now). Already-completed handles are primed with the persisted result on submit.
247
+
248
+ | Backend | Trade-off |
249
+ |---------------------|----------------------------------------------------------------|
250
+ | `JSONCheckpointer` | Default. Args/return values must JSON-serialise. Validated at submit. |
251
+ | `PickleCheckpointer`| Anything pickle accepts; standard pickle caveats apply. |
252
+ | custom | Subclass `Checkpointer` with `save()`/`load()`. |
253
+
254
+ Unserialisable args raise `CheckpointError` at *submit* time so you see the error where it originated.
255
+
256
+ ## Scheduling
257
+
258
+ ```python
259
+ from pyworkflowy import TaskRunner, task
260
+ from pyworkflowy.schedule import Scheduler
261
+
262
+ @task
263
+ def cleanup():
264
+ ...
265
+
266
+ with TaskRunner() as runner:
267
+ sched = Scheduler(runner=runner)
268
+ sched.every(60).do(cleanup)
269
+ sched.cron("0 * * * *").do(cleanup) # top of every hour
270
+ sched.at(datetime(2026, 12, 31, 23, 59)).do(cleanup) # one-shot
271
+
272
+ sched.start() # background thread
273
+ # ... do other work ...
274
+ sched.stop()
275
+ ```
276
+
277
+ Or async, on your own loop:
278
+
279
+ ```python
280
+ sched = Scheduler(runner=runner)
281
+ sched.every(0.5).do(cleanup)
282
+ await sched.arun()
283
+ ```
284
+
285
+ For tests, `sched.tick()` fires every due job once and returns the resulting handles — useful with a fake clock.
286
+
287
+ ### Cron subset
288
+
289
+ `m h dom mon dow` — five fields, space-separated:
290
+
291
+ | Syntax | Example | Meaning |
292
+ |--------------|-----------------|----------------------------------|
293
+ | `*` | `*` | every value in range |
294
+ | `N` | `5` | exactly `5` |
295
+ | `a-b` | `9-17` | inclusive range |
296
+ | `a,b,c` | `0,15,30` | list |
297
+ | `*/N` | `*/15` | every N from the start of range |
298
+ | `a-b/N` | `9-17/2` | every N within range |
299
+
300
+ `day_of_week` uses cron-style `0=Sunday`. No seconds field, no `@hourly`/`@daily` aliases, no `L`/`W` modifiers. Missed fires while the scheduler is stopped are **not** backfilled.
301
+
302
+ ## Async
303
+
304
+ Async tasks need the asyncio backend. The runner orchestrates on asyncio either way — `run()` just wraps `asyncio.run(arun())`.
305
+
306
+ ```python
307
+ @task
308
+ async def fetch(url):
309
+ async with httpx.AsyncClient() as c:
310
+ return (await c.get(url)).json()
311
+
312
+ runner = TaskRunner()
313
+ h = runner.submit(fetch, "https://example.com")
314
+ await runner.arun()
315
+ print(await h) # handles are awaitable
316
+ runner.shutdown()
317
+ ```
318
+
319
+ Handles are awaitable; awaiting yields the value or raises the failure. Sync handles also work — `handle.result(timeout=...)` blocks.
320
+
321
+ ## Errors
322
+
323
+ | Exception | Raised when |
324
+ |--------------------------|-------------------------------------------------------------------|
325
+ | `TaskError` | Base class. Catch this to swallow any pyWorkflowy-raised error. |
326
+ | `TaskTimeoutError` | A task exceeds its `timeout=` budget. **Not retried**. |
327
+ | `TaskCancelledError` | A task was cancelled via `handle.cancel()` / `runner.cancel_all()`. |
328
+ | `CycleError` | A submission would close a cycle in the dependency graph. |
329
+ | `DependencyFailedError` | A task with `on_dep_failure="fail"` had a failed dep. |
330
+ | `RetryExhaustedError` | All attempts failed; wraps the last exception via `__cause__`. |
331
+ | `CheckpointError` | Serialisation or I/O failed in a `Checkpointer`. |
332
+
333
+ Inside a task body, you can read `current_task()` for the current `TaskContext` (name, attempt number, cancel event).
334
+
335
+ ## Threads / Multiprocessing notes
336
+
337
+ - The thread pool is created lazily on first use, shut down when the runner is shut down. `current_task()` *does* work in the thread backend because pyWorkflowy binds the contextvar on entry.
338
+ - The process backend's workers run in **separate interpreters** — module-level state is reimported, `current_task()` returns `None`, and the function reference is serialised via pickle. Top-level functions only; no lambdas, no nested defs.
339
+ - On Windows (and on Python 3.14+ generally), the default start method is `spawn` — the same caveat: every worker re-imports your code.
340
+
341
+ ## Development
342
+
343
+ ```bash
344
+ uv sync
345
+ uv run pytest
346
+ uv run ruff check .
347
+ uv run pyrefly check
348
+ ```