crazy-workers 1.1.0__tar.gz → 1.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/PKG-INFO +172 -8
  2. crazy_workers-1.2.0/README.md +397 -0
  3. crazy_workers-1.2.0/crazy_workers/boot/__init__.py +14 -0
  4. crazy_workers-1.2.0/crazy_workers/boot/__main__.py +5 -0
  5. crazy_workers-1.2.0/crazy_workers/boot/base.py +83 -0
  6. crazy_workers-1.2.0/crazy_workers/boot/detect.py +14 -0
  7. crazy_workers-1.2.0/crazy_workers/boot/entry.py +18 -0
  8. crazy_workers-1.2.0/crazy_workers/boot/orchestrator.py +59 -0
  9. crazy_workers-1.2.0/crazy_workers/boot/systemd.py +94 -0
  10. crazy_workers-1.2.0/crazy_workers/boot/windows.py +47 -0
  11. crazy_workers-1.2.0/crazy_workers/cli/commands/__init__.py +7 -0
  12. crazy_workers-1.2.0/crazy_workers/cli/commands/status.py +82 -0
  13. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/main.py +6 -11
  14. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/manager/__init__.py +38 -9
  15. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/manager/starter.py +14 -1
  16. crazy_workers-1.2.0/crazy_workers/database/storage.py +76 -0
  17. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers.egg-info/PKG-INFO +172 -8
  18. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers.egg-info/SOURCES.txt +9 -2
  19. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/pyproject.toml +1 -1
  20. crazy_workers-1.1.0/README.md +0 -233
  21. crazy_workers-1.1.0/crazy_workers/cli/commands/__init__.py +0 -8
  22. crazy_workers-1.1.0/crazy_workers/cli/commands/lister.py +0 -57
  23. crazy_workers-1.1.0/crazy_workers/cli/commands/restorer.py +0 -14
  24. crazy_workers-1.1.0/crazy_workers/database/storage.py +0 -56
  25. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/LICENSE +0 -0
  26. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/__init__.py +0 -0
  27. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/_bootstrap.py +0 -0
  28. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/__init__.py +0 -0
  29. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/commands/params.py +0 -0
  30. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/commands/starter.py +0 -0
  31. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/commands/stopper.py +0 -0
  32. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/discovery.py +0 -0
  33. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/cli/ui.py +0 -0
  34. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/__init__.py +0 -0
  35. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/backend.py +0 -0
  36. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/engine.py +0 -0
  37. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/manager/lister.py +0 -0
  38. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/manager/recoverer.py +0 -0
  39. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/manager/stopper.py +0 -0
  40. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/core/recovery.py +0 -0
  41. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/database/__init__.py +0 -0
  42. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/database/schema.py +0 -0
  43. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/testing/__init__.py +0 -0
  44. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/testing/backends.py +0 -0
  45. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers/testing/polling.py +0 -0
  46. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers.egg-info/dependency_links.txt +0 -0
  47. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers.egg-info/entry_points.txt +0 -0
  48. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers.egg-info/requires.txt +0 -0
  49. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/crazy_workers.egg-info/top_level.txt +0 -0
  50. {crazy_workers-1.1.0 → crazy_workers-1.2.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: crazy-workers
3
- Version: 1.1.0
3
+ Version: 1.2.0
4
4
  Summary: A Python library for managing background worker processes with persistent state, automatic recovery, and a CLI.
5
5
  Author: GioVanni Colasanto
6
6
  License: MIT
@@ -43,8 +43,10 @@ A Python library for managing background worker processes with persistent state,
43
43
  ## Features
44
44
 
45
45
  - **Persistent State** — SQLite database tracks worker status, PIDs, and parameters across restarts.
46
+ - **Backend Integration** — Co-locate crazy_workers' tables in your project's database (pass a SQLAlchemy engine or URL), inject a shared `DATABASE_URL` into every worker, and recover workers automatically when the backend boots. See [Backend integration](#backend-integration).
46
47
  - **Process Management** — Start, stop, and monitor background Python scripts as independent OS processes.
47
48
  - **Automatic Recovery** — Detects crashed workers and restarts them on application boot.
49
+ - **Automatic Boot-Restore** — On Linux and Windows, starting a worker transparently installs a per-user OS hook (a systemd user unit / a logon Scheduled Task) that runs recovery after a machine reboot — no host application required. Opt out with `CRAZY_WORKERS_NO_BOOT`. See [Automatic boot-restore](#automatic-boot-restore).
48
50
  - **Child Process Control** — On stop, terminates unmanaged subprocesses while preserving independently-managed nested workers.
49
51
  - **CLI Interface** — Manage workers from the terminal with interactive prompts and auto-discovery (see [CLI.md](https://github.com/Vanni-broUser/crazy-workers/blob/main/CLI.md)).
50
52
  - **Security** — Worker types and keys are restricted to a safe identifier charset (`A-Z a-z 0-9 _ -`), with a defence-in-depth check that the resolved script path stays inside the workers directory. This blocks path traversal on both Unix and Windows (including drive-relative names like `c:evil`).
@@ -52,6 +54,7 @@ A Python library for managing background worker processes with persistent state,
52
54
  - **Zombie Protection** — Distinguishes active processes from zombies using `psutil`.
53
55
  - **PID-Reuse Safe** — Each worker is tagged with an identity token on its command line; recovery and stop confirm a PID still belongs to the worker before acting, so a recycled PID is never mistaken for (or worse, killed as) a live worker. Works on both Unix and Windows.
54
56
  - **Gunicorn-safe** — File-based lock prevents concurrent recovery runs across multiple workers.
57
+ - **Testable** — Drive your orchestration with a `FakeBackend` (no real processes) and import polling helpers for the few genuine end-to-end tests. See [Testing your app](#testing-your-app).
55
58
 
56
59
  ## Installation
57
60
 
@@ -114,22 +117,30 @@ manager.dispose() # releases DB connection; does NOT kill workers
114
117
  ### 3. Or from the CLI
115
118
 
116
119
  ```bash
117
- crazy-workers list
120
+ crazy-workers status
118
121
  crazy-workers start my_worker --key job_1 --params '{"duration": 30}'
119
122
  crazy-workers stop job_1
120
- crazy-workers restore
121
123
  ```
122
124
 
123
125
  See [CLI.md](https://github.com/Vanni-broUser/crazy-workers/blob/main/CLI.md) for full CLI documentation.
124
126
 
125
127
  ## API Reference
126
128
 
127
- ### `WorkerManager(workers_dir, create_dir=True)`
129
+ ### `WorkerManager(workers_dir, create_dir=True, ...)`
128
130
 
129
131
  | Parameter | Type | Default | Description |
130
132
  |-----------|------|---------|-------------|
131
133
  | `workers_dir` | `str` | `'workers'` | Directory containing worker `.py` scripts |
132
134
  | `create_dir` | `bool` | `True` | Create `workers_dir` and `.service/` if they don't exist |
135
+ | `backend` | `ProcessBackend` | `None` | Process backend; the default spawns real subprocesses (tests inject a fake) |
136
+ | `auto_boot` | `bool` | `True` | Install the per-user OS boot-restore hook on first start — see [Automatic boot-restore](#automatic-boot-restore) |
137
+ | `boot_provider` | `BootProvider` | `None` | Override the boot-restore mechanism (mainly a test seam) |
138
+ | `db_url` | `str` | `None` | SQLAlchemy URL for worker state; defaults to SQLite under `.service/` |
139
+ | `engine` | `Engine` | `None` | Reuse an existing SQLAlchemy engine so the tables live in your database; **not** disposed by crazy_workers |
140
+ | `worker_env` | `dict` | `None` | Environment variables injected into **every** spawned worker (e.g. `DATABASE_URL`) |
141
+ | `auto_recover` | `bool` | `True` | Recover dead-but-`RUNNING` workers when the manager is constructed |
142
+
143
+ See [Backend integration](#backend-integration) for `db_url` / `engine` / `worker_env` / `auto_recover`.
133
144
 
134
145
  ### `start_worker(worker_type, worker_key=None, parameters=None, env=None)`
135
146
 
@@ -156,6 +167,8 @@ Returns a list of worker dicts including RUNNING, STOPPED, CRASHED, and NEVER_ST
156
167
 
157
168
  Restarts any worker whose DB status is RUNNING but whose process is dead. Uses a file lock to prevent concurrent recovery. Returns a list of restarted keys.
158
169
 
170
+ > You rarely call this directly: it runs automatically when the manager is constructed (`auto_recover=True`) and after a machine reboot via the boot hook. It remains available as an explicit, idempotent trigger.
171
+
159
172
  ### `dispose()`
160
173
 
161
174
  Closes the database connection and clears internal process references. Does **not** kill background workers — they continue running independently.
@@ -171,13 +184,105 @@ params = json.loads(sys.argv[1]) if len(sys.argv) > 1 else {}
171
184
  # ... do work ...
172
185
  ```
173
186
 
187
+ A worker is a separate process, so it cannot be handed a live object (e.g. a DB
188
+ connection). Pass **configuration** instead: the manager's `worker_env` (and any
189
+ per-call `env`) is injected as environment variables, so a worker reads, say,
190
+ `os.environ['DATABASE_URL']` and opens its own connection. See
191
+ [`example_app/workers/db_writer.py`](https://github.com/Vanni-broUser/crazy-workers/blob/main/example_app/workers/db_writer.py).
192
+
193
+ ## Testing your app
194
+
195
+ Code that uses `WorkerManager` has two kinds of logic worth testing:
196
+ **orchestration** (which workers start, pairing, rollback, recovery) and the
197
+ **work itself** (does the worker actually do its job). `crazy_workers.testing`
198
+ makes both fast and non-flaky.
199
+
200
+ ### Orchestration — without launching a single process
201
+
202
+ `WorkerManager.for_testing()` wires a **FakeBackend**: spawning and termination
203
+ are faked, but the state machine (SQLite, recovery, validation) stays **real**
204
+ and runs in-process. The backend is exposed as `manager.test` for assertions.
205
+
206
+ ```python
207
+ from crazy_workers import WorkerManager
208
+
209
+ def test_starts_recorder_and_renamer():
210
+ manager = WorkerManager.for_testing('workers') # FakeBackend, no processes
211
+
212
+ manager.start_worker('recorder', worker_key='cam1', parameters={'device': 'cam1'})
213
+ manager.start_worker('renamer', worker_key='renamer_cam1', parameters={'output_dir': '/data/cam1'})
214
+
215
+ assert manager.test.started_types == ['recorder', 'renamer']
216
+ assert manager.test.is_running('cam1')
217
+ assert manager.test.parameters_for('renamer_cam1') == {'output_dir': '/data/cam1'}
218
+ manager.dispose()
219
+
220
+
221
+ def test_recovery_restarts_a_crash():
222
+ manager = WorkerManager.for_testing('workers')
223
+ manager.start_worker('recorder', worker_key='cam1')
224
+
225
+ manager.test.crash('cam1') # simulate an unexpected death
226
+ manager.recover_workers() # the real recovery path runs in-process
227
+
228
+ assert manager.test.start_count('cam1') == 2
229
+ assert manager.test.is_running('cam1')
230
+ manager.dispose()
231
+ ```
232
+
233
+ `workers_dir` must still contain the `<type>.py` files (`start_worker` checks
234
+ the script exists), but the fake backend never executes them — empty files are
235
+ enough.
236
+
237
+ `manager.test` exposes:
238
+
239
+ | Member | Returns |
240
+ |---|---|
241
+ | `started_types` / `started_keys` | every spawn, in order (a restart appears twice) |
242
+ | `running_keys` | keys whose latest process is still "alive" |
243
+ | `is_running(key)` | bool |
244
+ | `start_count(key)` | how many times the key was (re)started |
245
+ | `parameters_for(key)` | parameters of the most recent spawn |
246
+ | `crash(key)` | simulate an unexpected death (without a stop) |
247
+
248
+ ### The real thing — polling helpers, not `sleep`
249
+
250
+ The few tests that *must* launch real processes should wait on conditions,
251
+ never fixed sleeps. `crazy_workers.testing` exposes the helpers used by the
252
+ library's own suite:
253
+
254
+ ```python
255
+ from crazy_workers.testing import wait_for_worker_status, wait_for_log, wait_for_pid_dead
256
+
257
+ manager = WorkerManager('workers')
258
+ ok, res = manager.start_worker('recorder', worker_key='cam1')
259
+
260
+ wait_for_worker_status(manager, 'cam1', 'RUNNING')
261
+ wait_for_log('workers/.service/logs/cam1.log', 'recording started')
262
+ # ... assert the worker actually did its job ...
263
+ manager.stop_worker('cam1')
264
+ wait_for_pid_dead(res['pid'])
265
+ ```
266
+
267
+ Available: `wait_for`, `wait_for_file`, `wait_for_log`, `wait_for_worker_status`,
268
+ `wait_for_worker_in_db`, `wait_for_worker_pid`, `wait_for_pid_dead`. Each raises
269
+ `AssertionError` with a useful message on timeout.
270
+
271
+ > **What the fake covers — and what it doesn't.** `for_testing`/FakeBackend
272
+ > tests *orchestration*, not "the worker actually records / sends / converts".
273
+ > Keep a small number of real end-to-end tests for that, made stable with the
274
+ > polling helpers above, and move everything else — the bulk — into the fast,
275
+ > deterministic fake world.
276
+
174
277
  ## Project Structure
175
278
 
176
279
  ```
177
280
  crazy_workers/ # Library package
178
281
  core/ # WorkerManager, process engine, recovery lock
282
+ boot/ # Automatic per-user boot-restore (systemd user unit / scheduled task)
179
283
  cli/ # CLI entry point, commands, discovery
180
- database/ # SQLAlchemy schema and SQLite storage
284
+ database/ # SQLAlchemy schema and pluggable storage (SQLite, or a shared engine/URL)
285
+ testing/ # FakeBackend + polling helpers for consumer test suites
181
286
  example_app/ # Flask demo application
182
287
  app.py
183
288
  workers/ # Example worker scripts
@@ -185,6 +290,7 @@ tests/
185
290
  core/ # Unit tests for core modules
186
291
  cli/ # Unit tests for CLI modules
187
292
  database/ # Unit tests for storage layer
293
+ testing/ # Tests for the FakeBackend and polling helpers
188
294
  integration/ # Full-stack integration tests (real processes)
189
295
  app/ # Tests for the example Flask app
190
296
  ```
@@ -196,9 +302,16 @@ tests/
196
302
  ```python
197
303
  from crazy_workers import WorkerManager
198
304
 
199
- def create_app():
305
+ def create_app(db_engine, db_url):
200
306
  app = Flask(__name__)
201
- manager = WorkerManager('workers')
307
+
308
+ manager = WorkerManager(
309
+ 'workers',
310
+ engine=db_engine, # crazy_workers' tables live in YOUR database
311
+ worker_env={'DATABASE_URL': db_url}, # injected into every worker
312
+ # auto_recover=True (default): when the app boots, workers left RUNNING
313
+ # with a dead PID are restored automatically — no explicit call needed.
314
+ )
202
315
 
203
316
  @app.route('/workers/start', methods=['POST'])
204
317
  def start():
@@ -210,12 +323,31 @@ def create_app():
210
323
  )
211
324
  return (jsonify(result), 200) if success else (jsonify({'error': result}), 400)
212
325
 
213
- manager.recover_workers() # restart any crashed workers on boot
214
326
  return app
215
327
  ```
216
328
 
217
329
  See [`example_app/app.py`](https://github.com/Vanni-broUser/crazy-workers/blob/main/example_app/app.py) for a complete example.
218
330
 
331
+ ## Backend integration
332
+
333
+ When crazy_workers runs inside a backend, three options let it cooperate with
334
+ the project instead of living off to the side:
335
+
336
+ - **Co-locate its tables in your database.** Pass an existing SQLAlchemy
337
+ `engine` (or a `db_url`) to `WorkerManager`. crazy_workers creates its own
338
+ `workers` table inside your database and inherits its persistence and backups
339
+ — so state survives even if the process/container is recreated. A shared
340
+ engine is never disposed by crazy_workers; its owner manages it.
341
+ - **Give workers the connection they need.** A worker is a separate process, so
342
+ it can't receive a live DB connection — pass the *configuration* instead.
343
+ `worker_env={'DATABASE_URL': ...}` is injected into every spawned worker
344
+ (overridable per call via `start_worker(..., env=...)`); the worker opens its
345
+ own connection from it.
346
+ - **Recovery on construction.** `auto_recover=True` (default) restores
347
+ dead-but-`RUNNING` workers when the manager is built, so a restarting backend
348
+ brings its workers back with no explicit call. The CLI and boot entrypoint
349
+ set it to `False` (management/one-shot, not supervision).
350
+
219
351
  ## Gunicorn / Multi-Process Servers
220
352
 
221
353
  When using a pre-fork server like Gunicorn:
@@ -223,6 +355,38 @@ When using a pre-fork server like Gunicorn:
223
355
  - **Recovery is atomic** — a file lock (`.service/workers.db.recovery.lock`) ensures `recover_workers()` runs once even when multiple workers boot simultaneously.
224
356
  - **Workers outlive their parent** — if a Gunicorn worker is recycled, background processes keep running. The next recovery cycle re-attaches or restarts them.
225
357
 
358
+ ## Automatic boot-restore
359
+
360
+ Starting a worker transparently installs a **per-user OS hook** for its workers
361
+ directory, so workers come back after a reboot without any host application
362
+ running. The hook calls the internal entrypoint `python -m crazy_workers.boot
363
+ --workers-dir <dir>`, which runs `recover_workers()`. The install is
364
+ best-effort and happens at most once per directory (a marker lives in
365
+ `.service/boot.json`); a failure never blocks the worker from starting and is
366
+ reported by `crazy-workers status`.
367
+
368
+ | Platform | Mechanism | When it runs |
369
+ |----------|-----------|--------------|
370
+ | Linux | systemd **user** unit (`~/.config/systemd/user`) | at user login — or at true boot if **lingering** is enabled |
371
+ | Windows | logon Scheduled Task (`schtasks /SC ONLOGON`) | at user logon |
372
+
373
+ **Unattended boot — the honest caveat.** The recovery itself is durable across
374
+ consecutive reboots (a worker left `RUNNING` with a dead PID is restarted, and
375
+ this is PID-reuse safe). But running it *without any login* depends on the OS:
376
+
377
+ - **Linux:** enable lingering once, as root — `loginctl enable-linger <user>` —
378
+ so the user's systemd starts at boot (the same model as the Docker daemon
379
+ restarting containers). Without it, restore runs at the next login.
380
+ - **Windows:** a user task fires at logon; true pre-logon start needs autologon
381
+ or an administrator-installed task.
382
+
383
+ `crazy-workers status` always reports whether the hook runs **at boot** or
384
+ **at login**, so this is never silently wrong.
385
+
386
+ **Opting out.** Set `CRAZY_WORKERS_NO_BOOT` (any non-empty value) to disable the
387
+ automatic install entirely — useful in containers, CI, or when you manage boot
388
+ yourself. You can also pass `WorkerManager(..., auto_boot=False)`.
389
+
226
390
  ## Logs
227
391
 
228
392
  Each worker's stdout/stderr is appended to `.service/logs/<worker_key>.log`. These files are written directly by the worker process, so the library does **not** rotate them — they grow until you act. For long-lived deployments, rotate them externally (e.g. `logrotate` with `copytruncate`) or have your worker script configure its own `logging.handlers.RotatingFileHandler` instead of writing to stdout/stderr.
@@ -0,0 +1,397 @@
1
+ # Crazy Workers
2
+
3
+ A Python library for managing background worker processes with persistent state, automatic crash recovery, and a built-in CLI.
4
+
5
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
7
+
8
+ ## Features
9
+
10
+ - **Persistent State** — SQLite database tracks worker status, PIDs, and parameters across restarts.
11
+ - **Backend Integration** — Co-locate crazy_workers' tables in your project's database (pass a SQLAlchemy engine or URL), inject a shared `DATABASE_URL` into every worker, and recover workers automatically when the backend boots. See [Backend integration](#backend-integration).
12
+ - **Process Management** — Start, stop, and monitor background Python scripts as independent OS processes.
13
+ - **Automatic Recovery** — Detects crashed workers and restarts them on application boot.
14
+ - **Automatic Boot-Restore** — On Linux and Windows, starting a worker transparently installs a per-user OS hook (a systemd user unit / a logon Scheduled Task) that runs recovery after a machine reboot — no host application required. Opt out with `CRAZY_WORKERS_NO_BOOT`. See [Automatic boot-restore](#automatic-boot-restore).
15
+ - **Child Process Control** — On stop, terminates unmanaged subprocesses while preserving independently-managed nested workers.
16
+ - **CLI Interface** — Manage workers from the terminal with interactive prompts and auto-discovery (see [CLI.md](https://github.com/Vanni-broUser/crazy-workers/blob/main/CLI.md)).
17
+ - **Security** — Worker types and keys are restricted to a safe identifier charset (`A-Z a-z 0-9 _ -`), with a defence-in-depth check that the resolved script path stays inside the workers directory. This blocks path traversal on both Unix and Windows (including drive-relative names like `c:evil`).
18
+ - **Observability** — Per-worker file logging; all service files (DB, lock, logs) live in a `.service/` folder inside your workers directory.
19
+ - **Zombie Protection** — Distinguishes active processes from zombies using `psutil`.
20
+ - **PID-Reuse Safe** — Each worker is tagged with an identity token on its command line; recovery and stop confirm a PID still belongs to the worker before acting, so a recycled PID is never mistaken for (or worse, killed as) a live worker. Works on both Unix and Windows.
21
+ - **Gunicorn-safe** — File-based lock prevents concurrent recovery runs across multiple workers.
22
+ - **Testable** — Drive your orchestration with a `FakeBackend` (no real processes) and import polling helpers for the few genuine end-to-end tests. See [Testing your app](#testing-your-app).
23
+
24
+ ## Installation
25
+
26
+ ```bash
27
+ pip install crazy-workers
28
+ ```
29
+
30
+ Or from source:
31
+
32
+ ```bash
33
+ git clone https://github.com/Vanni-broUser/crazy-workers
34
+ cd crazy-workers
35
+ pip install .
36
+ ```
37
+
38
+ ## Quick Start
39
+
40
+ ### 1. Create a worker script
41
+
42
+ ```python
43
+ # workers/my_worker.py
44
+ import json, sys, time
45
+
46
+ params = json.loads(sys.argv[1]) if len(sys.argv) > 1 else {}
47
+ duration = params.get('duration', 60)
48
+
49
+ for _ in range(duration):
50
+ time.sleep(1)
51
+ ```
52
+
53
+ ### 2. Manage it from Python
54
+
55
+ ```python
56
+ from crazy_workers import WorkerManager
57
+
58
+ manager = WorkerManager('workers')
59
+
60
+ # Start
61
+ success, result = manager.start_worker(
62
+ 'my_worker',
63
+ worker_key='job_1',
64
+ parameters={'duration': 30},
65
+ )
66
+ print(result['pid']) # OS process ID
67
+ print(result['status']) # 'RUNNING'
68
+
69
+ # List
70
+ for w in manager.list_workers():
71
+ print(w['worker_key'], w['status'])
72
+
73
+ # Stop
74
+ manager.stop_worker('job_1')
75
+
76
+ # Recover crashed workers (call on app startup)
77
+ restarted = manager.recover_workers()
78
+
79
+ manager.dispose() # releases DB connection; does NOT kill workers
80
+ ```
81
+
82
+ ### 3. Or from the CLI
83
+
84
+ ```bash
85
+ crazy-workers status
86
+ crazy-workers start my_worker --key job_1 --params '{"duration": 30}'
87
+ crazy-workers stop job_1
88
+ ```
89
+
90
+ See [CLI.md](https://github.com/Vanni-broUser/crazy-workers/blob/main/CLI.md) for full CLI documentation.
91
+
92
+ ## API Reference
93
+
94
+ ### `WorkerManager(workers_dir, create_dir=True, ...)`
95
+
96
+ | Parameter | Type | Default | Description |
97
+ |-----------|------|---------|-------------|
98
+ | `workers_dir` | `str` | `'workers'` | Directory containing worker `.py` scripts |
99
+ | `create_dir` | `bool` | `True` | Create `workers_dir` and `.service/` if they don't exist |
100
+ | `backend` | `ProcessBackend` | `None` | Process backend; the default spawns real subprocesses (tests inject a fake) |
101
+ | `auto_boot` | `bool` | `True` | Install the per-user OS boot-restore hook on first start — see [Automatic boot-restore](#automatic-boot-restore) |
102
+ | `boot_provider` | `BootProvider` | `None` | Override the boot-restore mechanism (mainly a test seam) |
103
+ | `db_url` | `str` | `None` | SQLAlchemy URL for worker state; defaults to SQLite under `.service/` |
104
+ | `engine` | `Engine` | `None` | Reuse an existing SQLAlchemy engine so the tables live in your database; **not** disposed by crazy_workers |
105
+ | `worker_env` | `dict` | `None` | Environment variables injected into **every** spawned worker (e.g. `DATABASE_URL`) |
106
+ | `auto_recover` | `bool` | `True` | Recover dead-but-`RUNNING` workers when the manager is constructed |
107
+
108
+ See [Backend integration](#backend-integration) for `db_url` / `engine` / `worker_env` / `auto_recover`.
109
+
110
+ ### `start_worker(worker_type, worker_key=None, parameters=None, env=None)`
111
+
112
+ | Parameter | Type | Default | Description |
113
+ |-----------|------|---------|-------------|
114
+ | `worker_type` | `str` | — | Filename (without `.py`) of the worker script |
115
+ | `worker_key` | `str` | `worker_type` | Unique identifier; allows multiple instances of the same type |
116
+ | `parameters` | `dict` | `{}` | JSON-serializable dict passed as `sys.argv[1]` to the worker |
117
+ | `env` | `dict` | `None` | Extra environment variables injected into the worker process |
118
+
119
+ Returns `(bool, dict | str)` — `(True, worker_dict)` on success, `(False, error_message)` on failure.
120
+
121
+ > **Note on `RUNNING`:** success means the worker was *spawned* and survived a brief startup grace period that catches immediate launch failures (bad import, missing module). It does **not** guarantee the worker will run to completion — a worker that fails later is still reported `RUNNING` until the next `list_workers()` / `recover_workers()` reconciles its state.
122
+
123
+ ### `stop_worker(worker_key)`
124
+
125
+ Gracefully terminates the worker (SIGTERM → SIGKILL after timeout). Returns `(bool, str)`.
126
+
127
+ ### `list_workers()`
128
+
129
+ Returns a list of worker dicts including RUNNING, STOPPED, CRASHED, and NEVER_STARTED (filesystem-discovered) workers.
130
+
131
+ ### `recover_workers()`
132
+
133
+ Restarts any worker whose DB status is RUNNING but whose process is dead. Uses a file lock to prevent concurrent recovery. Returns a list of restarted keys.
134
+
135
+ > You rarely call this directly: it runs automatically when the manager is constructed (`auto_recover=True`) and after a machine reboot via the boot hook. It remains available as an explicit, idempotent trigger.
136
+
137
+ ### `dispose()`
138
+
139
+ Closes the database connection and clears internal process references. Does **not** kill background workers — they continue running independently.
140
+
141
+ ## Worker Script Contract
142
+
143
+ A worker receives its parameters as a JSON string in `sys.argv[1]`:
144
+
145
+ ```python
146
+ import json, sys
147
+
148
+ params = json.loads(sys.argv[1]) if len(sys.argv) > 1 else {}
149
+ # ... do work ...
150
+ ```
151
+
152
+ A worker is a separate process, so it cannot be handed a live object (e.g. a DB
153
+ connection). Pass **configuration** instead: the manager's `worker_env` (and any
154
+ per-call `env`) is injected as environment variables, so a worker reads, say,
155
+ `os.environ['DATABASE_URL']` and opens its own connection. See
156
+ [`example_app/workers/db_writer.py`](https://github.com/Vanni-broUser/crazy-workers/blob/main/example_app/workers/db_writer.py).
157
+
158
+ ## Testing your app
159
+
160
+ Code that uses `WorkerManager` has two kinds of logic worth testing:
161
+ **orchestration** (which workers start, pairing, rollback, recovery) and the
162
+ **work itself** (does the worker actually do its job). `crazy_workers.testing`
163
+ makes both fast and non-flaky.
164
+
165
+ ### Orchestration — without launching a single process
166
+
167
+ `WorkerManager.for_testing()` wires a **FakeBackend**: spawning and termination
168
+ are faked, but the state machine (SQLite, recovery, validation) stays **real**
169
+ and runs in-process. The backend is exposed as `manager.test` for assertions.
170
+
171
+ ```python
172
+ from crazy_workers import WorkerManager
173
+
174
+ def test_starts_recorder_and_renamer():
175
+ manager = WorkerManager.for_testing('workers') # FakeBackend, no processes
176
+
177
+ manager.start_worker('recorder', worker_key='cam1', parameters={'device': 'cam1'})
178
+ manager.start_worker('renamer', worker_key='renamer_cam1', parameters={'output_dir': '/data/cam1'})
179
+
180
+ assert manager.test.started_types == ['recorder', 'renamer']
181
+ assert manager.test.is_running('cam1')
182
+ assert manager.test.parameters_for('renamer_cam1') == {'output_dir': '/data/cam1'}
183
+ manager.dispose()
184
+
185
+
186
+ def test_recovery_restarts_a_crash():
187
+ manager = WorkerManager.for_testing('workers')
188
+ manager.start_worker('recorder', worker_key='cam1')
189
+
190
+ manager.test.crash('cam1') # simulate an unexpected death
191
+ manager.recover_workers() # the real recovery path runs in-process
192
+
193
+ assert manager.test.start_count('cam1') == 2
194
+ assert manager.test.is_running('cam1')
195
+ manager.dispose()
196
+ ```
197
+
198
+ `workers_dir` must still contain the `<type>.py` files (`start_worker` checks
199
+ the script exists), but the fake backend never executes them — empty files are
200
+ enough.
201
+
202
+ `manager.test` exposes:
203
+
204
+ | Member | Returns |
205
+ |---|---|
206
+ | `started_types` / `started_keys` | every spawn, in order (a restart appears twice) |
207
+ | `running_keys` | keys whose latest process is still "alive" |
208
+ | `is_running(key)` | bool |
209
+ | `start_count(key)` | how many times the key was (re)started |
210
+ | `parameters_for(key)` | parameters of the most recent spawn |
211
+ | `crash(key)` | simulate an unexpected death (without a stop) |
212
+
213
+ ### The real thing — polling helpers, not `sleep`
214
+
215
+ The few tests that *must* launch real processes should wait on conditions,
216
+ never fixed sleeps. `crazy_workers.testing` exposes the helpers used by the
217
+ library's own suite:
218
+
219
+ ```python
220
+ from crazy_workers.testing import wait_for_worker_status, wait_for_log, wait_for_pid_dead
221
+
222
+ manager = WorkerManager('workers')
223
+ ok, res = manager.start_worker('recorder', worker_key='cam1')
224
+
225
+ wait_for_worker_status(manager, 'cam1', 'RUNNING')
226
+ wait_for_log('workers/.service/logs/cam1.log', 'recording started')
227
+ # ... assert the worker actually did its job ...
228
+ manager.stop_worker('cam1')
229
+ wait_for_pid_dead(res['pid'])
230
+ ```
231
+
232
+ Available: `wait_for`, `wait_for_file`, `wait_for_log`, `wait_for_worker_status`,
233
+ `wait_for_worker_in_db`, `wait_for_worker_pid`, `wait_for_pid_dead`. Each raises
234
+ `AssertionError` with a useful message on timeout.
235
+
236
+ > **What the fake covers — and what it doesn't.** `for_testing`/FakeBackend
237
+ > tests *orchestration*, not "the worker actually records / sends / converts".
238
+ > Keep a small number of real end-to-end tests for that, made stable with the
239
+ > polling helpers above, and move everything else — the bulk — into the fast,
240
+ > deterministic fake world.
241
+
242
+ ## Project Structure
243
+
244
+ ```
245
+ crazy_workers/ # Library package
246
+ core/ # WorkerManager, process engine, recovery lock
247
+ boot/ # Automatic per-user boot-restore (systemd user unit / scheduled task)
248
+ cli/ # CLI entry point, commands, discovery
249
+ database/ # SQLAlchemy schema and pluggable storage (SQLite, or a shared engine/URL)
250
+ testing/ # FakeBackend + polling helpers for consumer test suites
251
+ example_app/ # Flask demo application
252
+ app.py
253
+ workers/ # Example worker scripts
254
+ tests/
255
+ core/ # Unit tests for core modules
256
+ cli/ # Unit tests for CLI modules
257
+ database/ # Unit tests for storage layer
258
+ testing/ # Tests for the FakeBackend and polling helpers
259
+ integration/ # Full-stack integration tests (real processes)
260
+ app/ # Tests for the example Flask app
261
+ ```
262
+
263
+ ## Flask Integration
264
+
265
+ > ⚠️ **Security:** `start_worker()` runs the worker script named by the caller. Exposing it over HTTP makes it a **privileged operation** — anyone who can reach the route can launch any script in your workers directory. Put such routes behind authentication, and prefer validating `worker_type` against a known allow-list of expected workers. The example below is a minimal demo with **no auth**.
266
+
267
+ ```python
268
+ from crazy_workers import WorkerManager
269
+
270
+ def create_app(db_engine, db_url):
271
+ app = Flask(__name__)
272
+
273
+ manager = WorkerManager(
274
+ 'workers',
275
+ engine=db_engine, # crazy_workers' tables live in YOUR database
276
+ worker_env={'DATABASE_URL': db_url}, # injected into every worker
277
+ # auto_recover=True (default): when the app boots, workers left RUNNING
278
+ # with a dead PID are restored automatically — no explicit call needed.
279
+ )
280
+
281
+ @app.route('/workers/start', methods=['POST'])
282
+ def start():
283
+ data = request.json
284
+ success, result = manager.start_worker(
285
+ data['worker_type'],
286
+ worker_key=data.get('worker_key'),
287
+ parameters=data.get('parameters', {}),
288
+ )
289
+ return (jsonify(result), 200) if success else (jsonify({'error': result}), 400)
290
+
291
+ return app
292
+ ```
293
+
294
+ See [`example_app/app.py`](https://github.com/Vanni-broUser/crazy-workers/blob/main/example_app/app.py) for a complete example.
295
+
296
+ ## Backend integration
297
+
298
+ When crazy_workers runs inside a backend, three options let it cooperate with
299
+ the project instead of living off to the side:
300
+
301
+ - **Co-locate its tables in your database.** Pass an existing SQLAlchemy
302
+ `engine` (or a `db_url`) to `WorkerManager`. crazy_workers creates its own
303
+ `workers` table inside your database and inherits its persistence and backups
304
+ — so state survives even if the process/container is recreated. A shared
305
+ engine is never disposed by crazy_workers; its owner manages it.
306
+ - **Give workers the connection they need.** A worker is a separate process, so
307
+ it can't receive a live DB connection — pass the *configuration* instead.
308
+ `worker_env={'DATABASE_URL': ...}` is injected into every spawned worker
309
+ (overridable per call via `start_worker(..., env=...)`); the worker opens its
310
+ own connection from it.
311
+ - **Recovery on construction.** `auto_recover=True` (default) restores
312
+ dead-but-`RUNNING` workers when the manager is built, so a restarting backend
313
+ brings its workers back with no explicit call. The CLI and boot entrypoint
314
+ set it to `False` (management/one-shot, not supervision).
315
+
316
+ ## Gunicorn / Multi-Process Servers
317
+
318
+ When using a pre-fork server like Gunicorn:
319
+
320
+ - **Recovery is atomic** — a file lock (`.service/workers.db.recovery.lock`) ensures `recover_workers()` runs once even when multiple workers boot simultaneously.
321
+ - **Workers outlive their parent** — if a Gunicorn worker is recycled, background processes keep running. The next recovery cycle re-attaches or restarts them.
322
+
323
+ ## Automatic boot-restore
324
+
325
+ Starting a worker transparently installs a **per-user OS hook** for its workers
326
+ directory, so workers come back after a reboot without any host application
327
+ running. The hook calls the internal entrypoint `python -m crazy_workers.boot
328
+ --workers-dir <dir>`, which runs `recover_workers()`. The install is
329
+ best-effort and happens at most once per directory (a marker lives in
330
+ `.service/boot.json`); a failure never blocks the worker from starting and is
331
+ reported by `crazy-workers status`.
332
+
333
+ | Platform | Mechanism | When it runs |
334
+ |----------|-----------|--------------|
335
+ | Linux | systemd **user** unit (`~/.config/systemd/user`) | at user login — or at true boot if **lingering** is enabled |
336
+ | Windows | logon Scheduled Task (`schtasks /SC ONLOGON`) | at user logon |
337
+
338
+ **Unattended boot — the honest caveat.** The recovery itself is durable across
339
+ consecutive reboots (a worker left `RUNNING` with a dead PID is restarted, and
340
+ this is PID-reuse safe). But running it *without any login* depends on the OS:
341
+
342
+ - **Linux:** enable lingering once, as root — `loginctl enable-linger <user>` —
343
+ so the user's systemd starts at boot (the same model as the Docker daemon
344
+ restarting containers). Without it, restore runs at the next login.
345
+ - **Windows:** a user task fires at logon; true pre-logon start needs autologon
346
+ or an administrator-installed task.
347
+
348
+ `crazy-workers status` always reports whether the hook runs **at boot** or
349
+ **at login**, so this is never silently wrong.
350
+
351
+ **Opting out.** Set `CRAZY_WORKERS_NO_BOOT` (any non-empty value) to disable the
352
+ automatic install entirely — useful in containers, CI, or when you manage boot
353
+ yourself. You can also pass `WorkerManager(..., auto_boot=False)`.
354
+
355
+ ## Logs
356
+
357
+ Each worker's stdout/stderr is appended to `.service/logs/<worker_key>.log`. These files are written directly by the worker process, so the library does **not** rotate them — they grow until you act. For long-lived deployments, rotate them externally (e.g. `logrotate` with `copytruncate`) or have your worker script configure its own `logging.handlers.RotatingFileHandler` instead of writing to stdout/stderr.
358
+
359
+ ## Development
360
+
361
+ ### Setup
362
+
363
+ ```bash
364
+ git clone https://github.com/Vanni-broUser/crazy-workers
365
+ cd crazy-workers
366
+ pip install -e .[dev]
367
+ ```
368
+
369
+ ### Commands
370
+
371
+ ```bash
372
+ # Lint and format
373
+ ruff check . --fix && ruff format .
374
+
375
+ # Run tests
376
+ pytest
377
+
378
+ # Run tests with coverage
379
+ coverage run -m pytest && coverage report
380
+ ```
381
+
382
+ ### Standards
383
+
384
+ See [AI.md](https://github.com/Vanni-broUser/crazy-workers/blob/main/AI.md) for the full coding and testing standards used in this project.
385
+
386
+ ## Support the project ❤️
387
+
388
+ crazy-workers is free and open-source (MIT). If it saves you time or powers your work,
389
+ consider supporting its development:
390
+
391
+ - **GitHub Sponsors** — recurring or one-time, 0% platform fee: https://github.com/sponsors/Vanni-broUser
392
+ - **Stripe** — one-time card donation via secure checkout: _Stripe Payment Link (coming soon)_
393
+ - **⭐ Star the repo** — free, and it really helps visibility.
394
+
395
+ > The Stripe link is configured via a [Payment Link](https://stripe.com/docs/payment-links).
396
+ > Replace the placeholder above (and in [`.github/FUNDING.yml`](.github/FUNDING.yml)) with your real
397
+ > `https://buy.stripe.com/...` URL once created.