chrono_forge 0.8.0 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +39 -0
- data/README.md +121 -9
- data/lib/chrono_forge/cleanup.rb +154 -0
- data/lib/chrono_forge/cleanup_job.rb +30 -0
- data/lib/chrono_forge/error_log.rb +2 -0
- data/lib/chrono_forge/executor/context.rb +26 -8
- data/lib/chrono_forge/executor/execution_tracker.rb +43 -4
- data/lib/chrono_forge/executor/lock_strategy.rb +9 -3
- data/lib/chrono_forge/executor/methods/continue_if.rb +3 -5
- data/lib/chrono_forge/executor/methods/durably_execute.rb +3 -5
- data/lib/chrono_forge/executor/methods/durably_repeat.rb +4 -9
- data/lib/chrono_forge/executor/methods/wait.rb +2 -4
- data/lib/chrono_forge/executor/methods/wait_until.rb +8 -6
- data/lib/chrono_forge/executor/methods/workflow_states.rb +15 -24
- data/lib/chrono_forge/executor.rb +55 -12
- data/lib/chrono_forge/version.rb +1 -1
- data/lib/chrono_forge/workflow.rb +26 -3
- data/lib/generators/chrono_forge/install/USAGE +5 -3
- data/lib/generators/chrono_forge/install/install_generator.rb +6 -8
- data/lib/generators/chrono_forge/migration_actions.rb +39 -0
- data/lib/generators/chrono_forge/templates/add_chrono_forge_error_log_step_context.rb +13 -0
- data/lib/generators/chrono_forge/templates/add_chrono_forge_workflow_state_index.rb +33 -0
- data/lib/generators/chrono_forge/upgrade/USAGE +13 -0
- data/lib/generators/chrono_forge/upgrade/upgrade_generator.rb +28 -0
- metadata +10 -3
- /data/lib/generators/chrono_forge/{install/templates → templates}/install_chrono_forge.rb +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1d7b16cc9e00eb7cc95b21a13331c9bc2dffdfcbd4f4e060c03e876f261fa73c
|
|
4
|
+
data.tar.gz: 4f9b0b28f7f69f518898ffc1e591087096ac71d84547d4c185c07cc6f0a11643
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 208180ae5d6fbe4b3ad30b05f1a60eb231f9bd3135304211ff7861f50e546eb32815ab5e8307e9c910cd1c5a7bdf951777635304ed7c629a7a62f8740df21c85
|
|
7
|
+
data.tar.gz: e3fac31dc46d1b126e8de982b02fe70d37ac7ffd6233ec643f20f93296237029640e67bb004cfaf7e71d5f60429cbc4702d4efe40951c48915e3bc08e561ec51
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,44 @@
|
|
|
1
1
|
## [Unreleased]
|
|
2
2
|
|
|
3
|
+
## [0.9.0] - 2026-06-03
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- `ChronoForge::Cleanup` and `ChronoForge::CleanupJob` — a schedulable, batched cleanup that deletes old terminal (completed/failed) workflows and their logs, and (opt-in) prunes the unbounded repetition logs that long-lived `durably_repeat` tasks accumulate. Repetition pruning is frontier-safe: only terminal repetitions scheduled strictly before the coordination log's `last_execution_at` are removed, so catch-up is never disrupted. Retention is configurable per terminal state (`completed_older_than` / `failed_older_than`).
|
|
8
|
+
- `chrono_forge:upgrade` generator that installs additive migrations existing apps are missing (idempotent — re-running either generator skips migrations that already exist).
|
|
9
|
+
- Composite `[state, completed_at]` index on `chrono_forge_workflows` (separate, strong_migrations-safe migration: built `CONCURRENTLY` on PostgreSQL, `if_not_exists`) to keep monitoring and cleanup scans efficient.
|
|
10
|
+
- Validation of user-supplied step names: a name/method/condition containing the reserved `$` separator now raises `ChronoForge::Executor::InvalidStepName`.
|
|
11
|
+
- `step_name` and `attempt` columns on `chrono_forge_error_logs` (additive migration), populated by error tracking so each error is attributable to the step and attempt it came from and can be ordered/correlated when tailing a workflow.
|
|
12
|
+
- Record-level re-execution: `ChronoForge::Workflow#retry_now` / `#retry_later` (plus `#retryable?`), so a failed/stalled workflow can be re-run straight from its record (e.g. `ChronoForge::Workflow.failed.find_each(&:retry_later)`) without constantizing the job class or re-passing the key. `retry_later` validates retryability up front and raises `WorkflowNotRetryableError` immediately instead of enqueuing a job that would fail in the worker.
|
|
13
|
+
|
|
14
|
+
### Changed
|
|
15
|
+
|
|
16
|
+
- **Performance:** execution-log and workflow lookups are now SELECT-first instead of INSERT-first, eliminating an `INSERT`-that-fails-on-the-unique-index (plus a burned sequence value) for every already-completed step on every replay.
|
|
17
|
+
- **Performance:** `LockStrategy.release_lock` reads only the lock owner column instead of reloading the full workflow row (which dragged the large JSON `context`/`kwargs`/`options` into memory on every resume).
|
|
18
|
+
- **Performance:** workflow completion/failure persist their execution log in a single `UPDATE` instead of two.
|
|
19
|
+
- **Performance:** `Context` deep-copies values via `as_json` instead of a `JSON.parse(JSON.generate(...))` round-trip.
|
|
20
|
+
- Error-log context snapshots are now bounded: all keys are kept, but once a 64 KB total budget is reached the remaining values are replaced with an `<<omitted>>` marker, so repeated error logging no longer duplicates large context blobs.
|
|
21
|
+
- Workflow retention is measured from when a workflow became terminal (`completed_at` for completed, `updated_at` for failed), not from `created_at` — long-running workflows that only just finished are retained for the full window.
|
|
22
|
+
|
|
23
|
+
### Breaking
|
|
24
|
+
|
|
25
|
+
- The per-value `Context` size limit is reduced from 64 KB to **16 KB** and is now measured in **bytes** (previously characters, and `String`-only). `Hash` and `Array` values are now size-validated too. Context is intended for small working state; store large payloads elsewhere and keep a reference. Existing workflows that *write* values larger than 16 KB will raise `ChronoForge::Executor::Context::ValidationError`; already-stored values are unaffected when read.
|
|
26
|
+
|
|
27
|
+
### Fixed
|
|
28
|
+
|
|
29
|
+
- A failed step no longer logs its terminal failure twice. Previously the step logged the underlying error and `perform` re-logged the `ExecutionFailedError` control-flow wrapper, producing a duplicate row. The wrapper is no longer logged; `wait_until` timeouts (which had no step-level log) are now logged at the step instead.
|
|
30
|
+
|
|
31
|
+
### Removed
|
|
32
|
+
|
|
33
|
+
- Dead `serialize :metadata` declaration on `ChronoForge::Workflow` (the table has no `metadata` column).
|
|
34
|
+
|
|
35
|
+
### Upgrading
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
rails generate chrono_forge:upgrade
|
|
39
|
+
rails db:migrate
|
|
40
|
+
```
|
|
41
|
+
|
|
3
42
|
## [0.1.0] - 2024-12-21
|
|
4
43
|
|
|
5
44
|
- Initial release
|
data/README.md
CHANGED
|
@@ -19,6 +19,7 @@ ChronoForge provides a powerful solution for handling long-running processes, ma
|
|
|
19
19
|
- **Wait States**: Support for time-based waits and condition-based waiting
|
|
20
20
|
- **Database-Backed**: All workflow state is persisted to ensure durability
|
|
21
21
|
- **ActiveJob Integration**: Compatible with all ActiveJob backends, though database-backed processors (like Solid Queue) provide the most reliable experience for long-running workflows
|
|
22
|
+
- **Retention & Cleanup**: A schedulable job to prune finished workflows and the unbounded logs that periodic tasks accumulate (see [Cleanup & Retention](#-cleanup--retention))
|
|
22
23
|
|
|
23
24
|
## 📦 Installation
|
|
24
25
|
|
|
@@ -47,6 +48,21 @@ $ rails generate chrono_forge:install
|
|
|
47
48
|
$ rails db:migrate
|
|
48
49
|
```
|
|
49
50
|
|
|
51
|
+
### Upgrading
|
|
52
|
+
|
|
53
|
+
When upgrading ChronoForge in an application that was installed with an earlier
|
|
54
|
+
version, run the upgrade generator to pick up any additive schema changes, then
|
|
55
|
+
migrate:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
$ rails generate chrono_forge:upgrade
|
|
59
|
+
$ rails db:migrate
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
The upgrade migration is idempotent (`if_not_exists`), so it is safe to run even
|
|
63
|
+
if your schema already has the index. Fresh installs get the index from the
|
|
64
|
+
install migration and do **not** need to run the upgrade.
|
|
65
|
+
|
|
50
66
|
## 📋 Usage
|
|
51
67
|
|
|
52
68
|
### Creating and Executing Workflows
|
|
@@ -431,6 +447,18 @@ end
|
|
|
431
447
|
|
|
432
448
|
The context supports serializable Ruby objects (Hash, Array, String, Integer, Float, Boolean, and nil) and validates types automatically.
|
|
433
449
|
|
|
450
|
+
Hash and Array values are stored as JSON, which has no symbols — so **symbol keys inside a stored hash come back as strings**:
|
|
451
|
+
|
|
452
|
+
```ruby
|
|
453
|
+
context[:totals] = { paid: 5, pending: 2 }
|
|
454
|
+
context[:totals] # => { "paid" => 5, "pending" => 2 }
|
|
455
|
+
context[:totals]["paid"] # => 5 (not context[:totals][:paid])
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
(The top-level context key itself is interchangeable — `context[:totals]` and `context["totals"]` refer to the same entry.)
|
|
459
|
+
|
|
460
|
+
Context is meant for **small working state** — ids, flags, timestamps, and small structures used to coordinate steps. Each value is capped at **16 KB** (a `ChronoForge::Executor::Context::ValidationError` is raised above that). Store large payloads (documents, uploads, API responses) in their own storage and keep just a reference (an id or key) in the context.
|
|
461
|
+
|
|
434
462
|
### 🛡️ Error Handling
|
|
435
463
|
|
|
436
464
|
ChronoForge automatically tracks errors and provides configurable retry capabilities:
|
|
@@ -581,20 +609,32 @@ stateDiagram-v2
|
|
|
581
609
|
|
|
582
610
|
#### Recovering Stalled/Failed Workflows
|
|
583
611
|
|
|
612
|
+
Re-execute a failed or stalled workflow directly from its record — no need to
|
|
613
|
+
constantize the job class or re-pass the key. Execution resumes via replay, so
|
|
614
|
+
completed steps are skipped and it picks up at the step that failed:
|
|
615
|
+
|
|
584
616
|
```ruby
|
|
585
617
|
workflow = ChronoForge::Workflow.find_by(key: "order-123")
|
|
586
618
|
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
590
|
-
# Retry immediately
|
|
591
|
-
job_class.retry_now(workflow.key)
|
|
592
|
-
|
|
593
|
-
# Or retry asynchronously
|
|
594
|
-
job_class.retry_later(workflow.key)
|
|
595
|
-
end
|
|
619
|
+
workflow.retry_later # re-run asynchronously (the common case)
|
|
620
|
+
workflow.retry_now # re-run inline (console/debugging)
|
|
596
621
|
```
|
|
597
622
|
|
|
623
|
+
Only `stalled` or `failed` workflows are retryable. `retryable?` lets you check
|
|
624
|
+
first, and both methods **validate up front** — calling `retry_later`
|
|
625
|
+
on a non-retryable workflow raises `ChronoForge::Executor::WorkflowNotRetryableError`
|
|
626
|
+
immediately rather than enqueuing a job that would fail in the worker:
|
|
627
|
+
|
|
628
|
+
```ruby
|
|
629
|
+
workflow.retryable? # => true/false
|
|
630
|
+
|
|
631
|
+
# Bulk re-run everything that failed:
|
|
632
|
+
ChronoForge::Workflow.failed.find_each(&:retry_later)
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
The class-level form (`MyWorkflow.retry_now(key)` / `retry_later(key)`) still
|
|
636
|
+
works if you have the class and key rather than the record.
|
|
637
|
+
|
|
598
638
|
#### Monitoring Running Workflows
|
|
599
639
|
|
|
600
640
|
Long-running workflows might indicate issues:
|
|
@@ -614,6 +654,78 @@ long_running.each do |workflow|
|
|
|
614
654
|
end
|
|
615
655
|
```
|
|
616
656
|
|
|
657
|
+
## 🧹 Cleanup & Retention
|
|
658
|
+
|
|
659
|
+
ChronoForge keeps every workflow and execution-log row indefinitely so that
|
|
660
|
+
replays remain idempotent. Over time two things grow without bound:
|
|
661
|
+
|
|
662
|
+
1. **Terminal workflows** (`completed` / `failed`) that are no longer needed.
|
|
663
|
+
2. **`durably_repeat` repetition logs** — one row per scheduled execution. A
|
|
664
|
+
long-lived periodic workflow never reaches a terminal state, so its
|
|
665
|
+
repetition logs accumulate indefinitely. Past repetitions (those behind the
|
|
666
|
+
task's current frontier) are never read again, since each resume recomputes
|
|
667
|
+
the next execution from the coordination log — so they are safe to prune (see
|
|
668
|
+
the safety note below).
|
|
669
|
+
|
|
670
|
+
`ChronoForge::Cleanup` reclaims both. It is **not** run automatically — schedule
|
|
671
|
+
it from your own scheduler so you stay in control of retention:
|
|
672
|
+
|
|
673
|
+
```ruby
|
|
674
|
+
ChronoForge::Cleanup.run(
|
|
675
|
+
older_than: 90.days, # default retention for terminal workflows (+ cascades their logs)
|
|
676
|
+
completed_older_than: 30.days, # optional: retention for completed workflows (defaults to older_than)
|
|
677
|
+
failed_older_than: 180.days, # optional: keep failures longer for debugging (defaults to older_than)
|
|
678
|
+
prune_repetition_logs_older_than: 30.days, # opt-in: prune old durably_repeat logs from still-active workflows
|
|
679
|
+
batch_size: 1_000 # rows deleted per batch
|
|
680
|
+
)
|
|
681
|
+
# => { workflows: 12, execution_logs: 84, error_logs: 3, repetition_logs: 240 }
|
|
682
|
+
```
|
|
683
|
+
|
|
684
|
+
Notes:
|
|
685
|
+
|
|
686
|
+
- `running`, `idle`, and `stalled` workflows are **never** deleted.
|
|
687
|
+
- `completed_older_than` / `failed_older_than` let you keep failed workflows
|
|
688
|
+
around longer than completed ones; both default to `older_than`.
|
|
689
|
+
- `prune_repetition_logs_older_than` is opt-in (defaults to `nil`); when unset,
|
|
690
|
+
repetition logs are only removed as part of deleting their parent workflow.
|
|
691
|
+
Pruning is deliberately conservative: it only removes terminal repetition logs
|
|
692
|
+
that are both older than the window **and** scheduled strictly before the
|
|
693
|
+
periodic task's current frontier (the coordination log's `last_execution_at`).
|
|
694
|
+
Anything at or after the frontier is kept so `durably_repeat`'s catch-up
|
|
695
|
+
mechanism is never disrupted — so the window is purely a retention preference
|
|
696
|
+
and is safe even for yearly schedules.
|
|
697
|
+
- Workflow retention is measured from when a workflow became terminal, not when
|
|
698
|
+
it was created — a long-running workflow that only just finished is kept for
|
|
699
|
+
the full window. Completed workflows use `completed_at` (immutable); failed
|
|
700
|
+
workflows use `updated_at` (they have no `completed_at`).
|
|
701
|
+
- The composite `[state, completed_at]` index added in this version keeps these
|
|
702
|
+
scans efficient — run `chrono_forge:upgrade` if you installed an earlier
|
|
703
|
+
version.
|
|
704
|
+
|
|
705
|
+
A ready-made job is bundled so you can schedule it with any recurring-job
|
|
706
|
+
mechanism (Solid Queue recurring tasks, sidekiq-cron, GoodJob cron, the
|
|
707
|
+
`whenever` gem, ...):
|
|
708
|
+
|
|
709
|
+
```ruby
|
|
710
|
+
ChronoForge::CleanupJob.perform_later(
|
|
711
|
+
older_than_days: 90,
|
|
712
|
+
failed_older_than_days: 180,
|
|
713
|
+
prune_repetition_logs_older_than_days: 30
|
|
714
|
+
)
|
|
715
|
+
```
|
|
716
|
+
|
|
717
|
+
The job takes plain day counts (not `Duration` objects) so it can be driven from
|
|
718
|
+
a config file. For example, with Solid Queue's recurring tasks
|
|
719
|
+
(`config/recurring.yml`):
|
|
720
|
+
|
|
721
|
+
```yaml
|
|
722
|
+
production:
|
|
723
|
+
chrono_forge_cleanup:
|
|
724
|
+
class: ChronoForge::CleanupJob
|
|
725
|
+
args: { older_than_days: 90, prune_repetition_logs_older_than_days: 30 }
|
|
726
|
+
schedule: every day at 3am
|
|
727
|
+
```
|
|
728
|
+
|
|
617
729
|
## 🚀 Development
|
|
618
730
|
|
|
619
731
|
After checking out the repo, run:
|
|
@@ -0,0 +1,154 @@
|
|
|
1
|
+
module ChronoForge
|
|
2
|
+
# Reclaims storage from finished workflows and the unbounded execution-log
|
|
3
|
+
# rows that periodic tasks (durably_repeat) accumulate.
|
|
4
|
+
#
|
|
5
|
+
# ChronoForge keeps every workflow and execution-log row indefinitely so that
|
|
6
|
+
# replays stay idempotent. Two things grow without bound over time:
|
|
7
|
+
#
|
|
8
|
+
# 1. Terminal workflows (completed/failed) that are no longer needed.
|
|
9
|
+
# 2. durably_repeat repetition logs — one row per scheduled execution. A
|
|
10
|
+
# long-lived periodic workflow never reaches a terminal state, so its
|
|
11
|
+
# repetition logs accumulate forever.
|
|
12
|
+
#
|
|
13
|
+
# This is not run automatically — schedule it from your own scheduler (cron,
|
|
14
|
+
# Solid Queue recurring tasks, sidekiq-cron, GoodJob cron, the `whenever`
|
|
15
|
+
# gem, ...). See ChronoForge::CleanupJob for a ready-made job, e.g.:
|
|
16
|
+
#
|
|
17
|
+
# ChronoForge::Cleanup.run(
|
|
18
|
+
# older_than: 90.days, # default retention for terminal workflows
|
|
19
|
+
# failed_older_than: 180.days, # keep failures longer for debugging
|
|
20
|
+
# prune_repetition_logs_older_than: 30.days # opt in to periodic-log pruning
|
|
21
|
+
# )
|
|
22
|
+
#
|
|
23
|
+
# == Workflow retention is measured from when a workflow became terminal
|
|
24
|
+
#
|
|
25
|
+
# Retention is measured from the terminal transition, not created_at: a
|
|
26
|
+
# long-running workflow may have been created long ago but only just finished.
|
|
27
|
+
# Completed workflows use the immutable completed_at; failed workflows have no
|
|
28
|
+
# completed_at, so they use updated_at (the failed! transition, after which
|
|
29
|
+
# nothing touches the row — release_lock/context use update_columns/
|
|
30
|
+
# update_column, which do not bump it).
|
|
31
|
+
#
|
|
32
|
+
# == Repetition-log pruning safety
|
|
33
|
+
#
|
|
34
|
+
# Pruning periodic logs is opt-in and deliberately conservative. A repetition
|
|
35
|
+
# log is removed only when its scheduled time is BOTH older than the retention
|
|
36
|
+
# window AND strictly before the periodic task's current frontier (the
|
|
37
|
+
# coordination log's last_execution_at). Everything at or after the frontier is
|
|
38
|
+
# kept, because durably_repeat's catch-up mechanism may still need it: the next
|
|
39
|
+
# execution is computed as last_execution_at + every, so anything at/after the
|
|
40
|
+
# frontier can still be revisited, while anything strictly before it never is.
|
|
41
|
+
# Both checks use the scheduled time embedded in the step name rather than
|
|
42
|
+
# created_at, which is misleading for catch-up rows created long after the
|
|
43
|
+
# occurrence they represent. A task that has not executed yet (no frontier) is
|
|
44
|
+
# never pruned.
|
|
45
|
+
class Cleanup
|
|
46
|
+
DEFAULT_RETENTION = 90.days
|
|
47
|
+
DEFAULT_BATCH_SIZE = 1_000
|
|
48
|
+
TERMINAL_LOG_STATES = %i[completed failed].freeze
|
|
49
|
+
|
|
50
|
+
# @param older_than [ActiveSupport::Duration] default retention for terminal
|
|
51
|
+
# workflows; used for any state without a specific override.
|
|
52
|
+
# @param completed_older_than [ActiveSupport::Duration, nil] retention for
|
|
53
|
+
# completed workflows. Defaults to older_than.
|
|
54
|
+
# @param failed_older_than [ActiveSupport::Duration, nil] retention for
|
|
55
|
+
# failed workflows. Defaults to older_than.
|
|
56
|
+
# @param prune_repetition_logs_older_than [ActiveSupport::Duration, nil]
|
|
57
|
+
# when set, also prune old terminal durably_repeat repetition logs from
|
|
58
|
+
# still-active workflows (see safety notes above). nil disables it.
|
|
59
|
+
# @param batch_size [Integer] rows per delete batch.
|
|
60
|
+
# @return [Hash] counts of deleted rows by category.
|
|
61
|
+
def self.run(**)
|
|
62
|
+
new(**).run
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
def initialize(older_than: DEFAULT_RETENTION, completed_older_than: nil, failed_older_than: nil,
|
|
66
|
+
prune_repetition_logs_older_than: nil, batch_size: DEFAULT_BATCH_SIZE)
|
|
67
|
+
@completed_older_than = completed_older_than || older_than
|
|
68
|
+
@failed_older_than = failed_older_than || older_than
|
|
69
|
+
@prune_repetition_logs_older_than = prune_repetition_logs_older_than
|
|
70
|
+
@batch_size = batch_size
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
def run
|
|
74
|
+
result = {workflows: 0, execution_logs: 0, error_logs: 0, repetition_logs: 0}
|
|
75
|
+
|
|
76
|
+
# Completed workflows use the immutable completed_at; failed workflows
|
|
77
|
+
# have no completed_at, so they fall back to updated_at.
|
|
78
|
+
delete_terminal_workflows(:completed, :completed_at, @completed_older_than, result)
|
|
79
|
+
delete_terminal_workflows(:failed, :updated_at, @failed_older_than, result)
|
|
80
|
+
prune_repetition_logs(result) if @prune_repetition_logs_older_than
|
|
81
|
+
|
|
82
|
+
result
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
private
|
|
86
|
+
|
|
87
|
+
def delete_terminal_workflows(state, timestamp_column, older_than, result)
|
|
88
|
+
cutoff = older_than.ago
|
|
89
|
+
|
|
90
|
+
Workflow.where(state: state)
|
|
91
|
+
.where(timestamp_column => ..cutoff)
|
|
92
|
+
.in_batches(of: @batch_size) do |batch|
|
|
93
|
+
ids = batch.ids
|
|
94
|
+
next if ids.empty?
|
|
95
|
+
|
|
96
|
+
# Delete dependent rows in bulk rather than relying on row-by-row
|
|
97
|
+
# dependent: :destroy callbacks.
|
|
98
|
+
result[:execution_logs] += ExecutionLog.where(workflow_id: ids).delete_all
|
|
99
|
+
result[:error_logs] += ErrorLog.where(workflow_id: ids).delete_all
|
|
100
|
+
result[:workflows] += Workflow.where(id: ids).delete_all
|
|
101
|
+
end
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def prune_repetition_logs(result)
|
|
105
|
+
cutoff = @prune_repetition_logs_older_than.ago.to_i
|
|
106
|
+
|
|
107
|
+
coordination_logs.find_each do |coordination_log|
|
|
108
|
+
frontier = coordination_frontier(coordination_log)
|
|
109
|
+
next unless frontier
|
|
110
|
+
|
|
111
|
+
# Repetition logs are "<coordination step_name>$<scheduled_at_unix>".
|
|
112
|
+
# Match the prefix exactly in Ruby rather than via SQL LIKE: the step
|
|
113
|
+
# name contains "_", a LIKE wildcard, so a LIKE pattern would need
|
|
114
|
+
# escaping that is not portable across adapters.
|
|
115
|
+
prefix = "#{coordination_log.step_name}$"
|
|
116
|
+
|
|
117
|
+
# Scan in batches so a periodic workflow with a large backlog of
|
|
118
|
+
# repetition logs (exactly the case cleanup exists to fix) never loads
|
|
119
|
+
# them all into memory at once. Batching by primary key and only
|
|
120
|
+
# deleting rows within the current batch keeps the cursor valid.
|
|
121
|
+
ExecutionLog
|
|
122
|
+
.where(workflow_id: coordination_log.workflow_id, state: TERMINAL_LOG_STATES)
|
|
123
|
+
.in_batches(of: @batch_size) do |batch|
|
|
124
|
+
prunable_ids = batch.pluck(:id, :step_name).filter_map do |id, step_name|
|
|
125
|
+
next unless step_name.start_with?(prefix)
|
|
126
|
+
|
|
127
|
+
scheduled_at = step_name.delete_prefix(prefix).to_i
|
|
128
|
+
id if scheduled_at < frontier && scheduled_at < cutoff
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
result[:repetition_logs] += ExecutionLog.where(id: prunable_ids).delete_all if prunable_ids.any?
|
|
132
|
+
end
|
|
133
|
+
end
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
# Coordination logs are "durably_repeat$<name>" — exactly one "$" segment
|
|
137
|
+
# after the prefix. Repetition logs add a second "$<timestamp>" segment.
|
|
138
|
+
def coordination_logs
|
|
139
|
+
ExecutionLog
|
|
140
|
+
.where("step_name LIKE ?", "durably_repeat$%")
|
|
141
|
+
.where.not("step_name LIKE ?", "durably_repeat$%$%")
|
|
142
|
+
.order(:id)
|
|
143
|
+
end
|
|
144
|
+
|
|
145
|
+
def coordination_frontier(coordination_log)
|
|
146
|
+
last_execution_at = coordination_log.metadata && coordination_log.metadata["last_execution_at"]
|
|
147
|
+
return unless last_execution_at
|
|
148
|
+
|
|
149
|
+
Time.parse(last_execution_at).to_i
|
|
150
|
+
rescue ArgumentError, TypeError
|
|
151
|
+
nil
|
|
152
|
+
end
|
|
153
|
+
end
|
|
154
|
+
end
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
module ChronoForge
|
|
2
|
+
# ActiveJob wrapper around {Cleanup} so the cleanup can be enqueued and
|
|
3
|
+
# scheduled with any recurring-job mechanism (Solid Queue recurring tasks,
|
|
4
|
+
# sidekiq-cron, GoodJob cron, ...).
|
|
5
|
+
#
|
|
6
|
+
# Arguments are plain scalars (day counts) rather than ActiveSupport::Duration
|
|
7
|
+
# objects so the job can be configured from YAML/cron config files, which can
|
|
8
|
+
# only carry primitive values:
|
|
9
|
+
#
|
|
10
|
+
# ChronoForge::CleanupJob.perform_later(
|
|
11
|
+
# older_than_days: 90,
|
|
12
|
+
# failed_older_than_days: 180,
|
|
13
|
+
# prune_repetition_logs_older_than_days: 30
|
|
14
|
+
# )
|
|
15
|
+
class CleanupJob < ActiveJob::Base
|
|
16
|
+
def perform(older_than_days: nil, completed_older_than_days: nil, failed_older_than_days: nil,
|
|
17
|
+
prune_repetition_logs_older_than_days: nil, batch_size: nil)
|
|
18
|
+
options = {}
|
|
19
|
+
options[:older_than] = older_than_days.to_i.days if older_than_days
|
|
20
|
+
options[:completed_older_than] = completed_older_than_days.to_i.days if completed_older_than_days
|
|
21
|
+
options[:failed_older_than] = failed_older_than_days.to_i.days if failed_older_than_days
|
|
22
|
+
if prune_repetition_logs_older_than_days
|
|
23
|
+
options[:prune_repetition_logs_older_than] = prune_repetition_logs_older_than_days.to_i.days
|
|
24
|
+
end
|
|
25
|
+
options[:batch_size] = batch_size.to_i if batch_size
|
|
26
|
+
|
|
27
|
+
Cleanup.run(**options)
|
|
28
|
+
end
|
|
29
|
+
end
|
|
30
|
+
end
|
|
@@ -5,10 +5,12 @@
|
|
|
5
5
|
# Table name: chrono_forge_error_logs
|
|
6
6
|
#
|
|
7
7
|
# id :integer not null, primary key
|
|
8
|
+
# attempt :integer
|
|
8
9
|
# backtrace :text
|
|
9
10
|
# context :json
|
|
10
11
|
# error_class :string
|
|
11
12
|
# error_message :text
|
|
13
|
+
# step_name :string
|
|
12
14
|
# created_at :datetime not null
|
|
13
15
|
# updated_at :datetime not null
|
|
14
16
|
# workflow_id :integer not null
|
|
@@ -14,6 +14,17 @@ module ChronoForge
|
|
|
14
14
|
Array
|
|
15
15
|
]
|
|
16
16
|
|
|
17
|
+
# Maximum serialized byte size of a single context value. Applies to the
|
|
18
|
+
# variable-length types (String, Hash, Array); scalars are unbounded in
|
|
19
|
+
# practice. Measured in bytes (not characters) since that is what is
|
|
20
|
+
# actually stored and what matters for write/storage cost.
|
|
21
|
+
#
|
|
22
|
+
# Context is meant to hold small working state (ids, flags, timestamps,
|
|
23
|
+
# small structures) — not documents or payloads, which belong in their own
|
|
24
|
+
# storage and can be referenced from context by id. 16 KB per value is
|
|
25
|
+
# already generous for that (hundreds of ids / dozens of records).
|
|
26
|
+
MAX_VALUE_BYTESIZE = 16.kilobytes
|
|
27
|
+
|
|
17
28
|
def initialize(workflow)
|
|
18
29
|
@workflow = workflow
|
|
19
30
|
@context = workflow.context || {}
|
|
@@ -67,7 +78,11 @@ module ChronoForge
|
|
|
67
78
|
|
|
68
79
|
@context[key.to_s] =
|
|
69
80
|
if value.is_a?(Hash) || value.is_a?(Array)
|
|
70
|
-
|
|
81
|
+
# as_json returns a fresh JSON-compatible structure with string keys
|
|
82
|
+
# — the same normalization the JSON column would apply on save and a
|
|
83
|
+
# deep copy that protects the store from later mutation of the
|
|
84
|
+
# source — without the cost of serializing to a string and reparsing.
|
|
85
|
+
value.as_json
|
|
71
86
|
else
|
|
72
87
|
value
|
|
73
88
|
end
|
|
@@ -84,16 +99,19 @@ module ChronoForge
|
|
|
84
99
|
raise ValidationError, "Unsupported context value type: #{value.inspect}"
|
|
85
100
|
end
|
|
86
101
|
|
|
87
|
-
|
|
88
|
-
if
|
|
89
|
-
raise ValidationError, "Context value too large"
|
|
102
|
+
byte_size = value_byte_size(value)
|
|
103
|
+
if byte_size && byte_size > MAX_VALUE_BYTESIZE
|
|
104
|
+
raise ValidationError, "Context value too large (#{byte_size} bytes, max #{MAX_VALUE_BYTESIZE})"
|
|
90
105
|
end
|
|
91
106
|
end
|
|
92
107
|
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
108
|
+
# Serialized byte size for the variable-length types; nil for scalars,
|
|
109
|
+
# which need no size constraint.
|
|
110
|
+
def value_byte_size(value)
|
|
111
|
+
case value
|
|
112
|
+
when String then value.bytesize
|
|
113
|
+
when Hash, Array then value.to_json.bytesize
|
|
114
|
+
end
|
|
97
115
|
end
|
|
98
116
|
end
|
|
99
117
|
end
|
|
@@ -1,16 +1,55 @@
|
|
|
1
1
|
module ChronoForge
|
|
2
2
|
module Executor
|
|
3
3
|
class ExecutionTracker
|
|
4
|
-
|
|
5
|
-
|
|
4
|
+
# Total budget for the context snapshot copied into each error log.
|
|
5
|
+
# Transient errors can be logged repeatedly (one row per retry), and the
|
|
6
|
+
# full context always remains on the workflow itself, so the error copy
|
|
7
|
+
# only needs to be a bounded diagnostic breadcrumb. Keys are preserved;
|
|
8
|
+
# values are kept until the running total would exceed this budget, after
|
|
9
|
+
# which each remaining value is replaced by OMITTED_VALUE. Per-value size
|
|
10
|
+
# is already bounded by Context validation, so no per-value truncation is
|
|
11
|
+
# needed here — a single value larger than the budget is simply replaced.
|
|
12
|
+
MAX_CONTEXT_BYTESIZE = 64.kilobytes
|
|
13
|
+
|
|
14
|
+
# Placeholder stored in place of a value that didn't fit the budget.
|
|
15
|
+
OMITTED_VALUE = "<<omitted>>"
|
|
16
|
+
|
|
17
|
+
# @param execution_log [ExecutionLog, nil] the step the error occurred in,
|
|
18
|
+
# if any. Its step_name and attempt count are recorded on the error log
|
|
19
|
+
# so errors can be attributed to a step and ordered within the workflow.
|
|
20
|
+
# @param attempt [Integer, nil] explicit attempt number for errors not tied
|
|
21
|
+
# to a step (e.g. a workflow-level failure). Falls back to the execution
|
|
22
|
+
# log's attempt count.
|
|
23
|
+
def self.track_error(workflow, error, execution_log: nil, attempt: nil)
|
|
6
24
|
ErrorLog.create!(
|
|
7
25
|
workflow: workflow,
|
|
26
|
+
step_name: execution_log&.step_name,
|
|
27
|
+
attempt: attempt || execution_log&.attempts,
|
|
8
28
|
error_class: error.class.name,
|
|
9
29
|
error_message: error.message,
|
|
10
|
-
backtrace: error.backtrace
|
|
11
|
-
context: workflow.context
|
|
30
|
+
backtrace: error.backtrace&.join("\n"),
|
|
31
|
+
context: error_context(workflow.context)
|
|
12
32
|
)
|
|
13
33
|
end
|
|
34
|
+
|
|
35
|
+
def self.error_context(context)
|
|
36
|
+
remaining = MAX_CONTEXT_BYTESIZE
|
|
37
|
+
|
|
38
|
+
context.each_with_object({}) do |(key, value), kept|
|
|
39
|
+
size = value.to_json.bytesize
|
|
40
|
+
if size <= remaining
|
|
41
|
+
kept[key] = value
|
|
42
|
+
remaining -= size
|
|
43
|
+
else
|
|
44
|
+
kept[key] = OMITTED_VALUE
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
rescue
|
|
48
|
+
# If the context cannot be traversed/serialized, fail safe to a marker
|
|
49
|
+
# rather than risk persisting something unbounded or unserializable.
|
|
50
|
+
{"_truncated" => true}
|
|
51
|
+
end
|
|
52
|
+
private_class_method :error_context
|
|
14
53
|
end
|
|
15
54
|
end
|
|
16
55
|
end
|
|
@@ -34,11 +34,17 @@ module ChronoForge
|
|
|
34
34
|
end
|
|
35
35
|
|
|
36
36
|
def release_lock(job_id, workflow, force: false)
|
|
37
|
-
|
|
38
|
-
|
|
37
|
+
# Read only the lock owner from the DB rather than reloading the whole
|
|
38
|
+
# row (which would drag the heavy context/kwargs/options JSON into memory
|
|
39
|
+
# on every resume) just to verify ownership. The in-memory state is
|
|
40
|
+
# already accurate here: acquire_lock set it to :running, and a
|
|
41
|
+
# completed/failed workflow had its state updated on this same instance.
|
|
42
|
+
current_locked_by = workflow.class.where(id: workflow.id).pick(:locked_by)
|
|
43
|
+
|
|
44
|
+
if !force && current_locked_by != job_id
|
|
39
45
|
raise LongRunningConcurrentExecutionError,
|
|
40
46
|
"ChronoForge:#{self.class}(#{workflow.key}) job(#{job_id}) executed longer than specified max_duration, " \
|
|
41
|
-
"allowed job(#{
|
|
47
|
+
"allowed job(#{current_locked_by}) to acquire the lock."
|
|
42
48
|
end
|
|
43
49
|
|
|
44
50
|
columns = {locked_at: nil, locked_by: nil}
|
|
@@ -96,13 +96,11 @@ module ChronoForge
|
|
|
96
96
|
# - User actions or form submissions
|
|
97
97
|
#
|
|
98
98
|
def continue_if(condition, name: nil)
|
|
99
|
+
validate_step_name_segment!(name || condition)
|
|
99
100
|
step_name = "continue_if$#{name || condition}"
|
|
100
101
|
|
|
101
102
|
# Find or create execution log
|
|
102
|
-
execution_log =
|
|
103
|
-
workflow: @workflow,
|
|
104
|
-
step_name: step_name
|
|
105
|
-
) do |log|
|
|
103
|
+
execution_log = find_or_create_execution_log!(step_name) do |log|
|
|
106
104
|
log.started_at = Time.current
|
|
107
105
|
log.metadata = {
|
|
108
106
|
condition: condition.to_s,
|
|
@@ -128,7 +126,7 @@ module ChronoForge
|
|
|
128
126
|
rescue => e
|
|
129
127
|
# Log the error and fail the execution
|
|
130
128
|
Rails.logger.error { "Error evaluating condition #{condition}: #{e.message}" }
|
|
131
|
-
self.class::ExecutionTracker.track_error(workflow, e)
|
|
129
|
+
self.class::ExecutionTracker.track_error(workflow, e, execution_log: execution_log)
|
|
132
130
|
|
|
133
131
|
execution_log.update!(
|
|
134
132
|
state: :failed,
|
|
@@ -60,12 +60,10 @@ module ChronoForge
|
|
|
60
60
|
# - Enables monitoring and debugging of execution history
|
|
61
61
|
#
|
|
62
62
|
def durably_execute(method, max_attempts: 3, name: nil)
|
|
63
|
+
validate_step_name_segment!(name || method)
|
|
63
64
|
step_name = "durably_execute$#{name || method}"
|
|
64
65
|
# Find or create execution log
|
|
65
|
-
execution_log =
|
|
66
|
-
workflow: @workflow,
|
|
67
|
-
step_name: step_name
|
|
68
|
-
) do |log|
|
|
66
|
+
execution_log = find_or_create_execution_log!(step_name) do |log|
|
|
69
67
|
log.started_at = Time.current
|
|
70
68
|
end
|
|
71
69
|
|
|
@@ -96,7 +94,7 @@ module ChronoForge
|
|
|
96
94
|
rescue => e
|
|
97
95
|
# Log the error
|
|
98
96
|
Rails.logger.error { "Error while durably executing #{method}: #{e.message}" }
|
|
99
|
-
self.class::ExecutionTracker.track_error(workflow, e)
|
|
97
|
+
self.class::ExecutionTracker.track_error(workflow, e, execution_log: execution_log)
|
|
100
98
|
|
|
101
99
|
# Optional retry logic
|
|
102
100
|
if execution_log.attempts < max_attempts
|
|
@@ -101,13 +101,11 @@ module ChronoForge
|
|
|
101
101
|
# - Repetition logs: `durably_repeat$#{name}$#{timestamp}` - tracks individual executions
|
|
102
102
|
#
|
|
103
103
|
def durably_repeat(method, every:, till:, start_at: nil, max_attempts: 3, timeout: 1.hour, on_error: :continue, name: nil)
|
|
104
|
+
validate_step_name_segment!(name || method)
|
|
104
105
|
step_name = "durably_repeat$#{name || method}"
|
|
105
106
|
|
|
106
107
|
# Get or create the main coordination log for this periodic task
|
|
107
|
-
coordination_log =
|
|
108
|
-
workflow: @workflow,
|
|
109
|
-
step_name: step_name
|
|
110
|
-
) do |log|
|
|
108
|
+
coordination_log = find_or_create_execution_log!(step_name) do |log|
|
|
111
109
|
log.started_at = Time.current
|
|
112
110
|
log.metadata = {last_execution_at: nil}
|
|
113
111
|
end
|
|
@@ -157,10 +155,7 @@ module ChronoForge
|
|
|
157
155
|
step_name = "#{coordination_log.step_name}$#{next_execution_at.to_i}"
|
|
158
156
|
|
|
159
157
|
# Create execution log for this specific repetition
|
|
160
|
-
repetition_log =
|
|
161
|
-
workflow: @workflow,
|
|
162
|
-
step_name: step_name
|
|
163
|
-
) do |log|
|
|
158
|
+
repetition_log = find_or_create_execution_log!(step_name) do |log|
|
|
164
159
|
log.started_at = Time.current
|
|
165
160
|
log.metadata = {
|
|
166
161
|
scheduled_for: next_execution_at,
|
|
@@ -225,7 +220,7 @@ module ChronoForge
|
|
|
225
220
|
rescue => e
|
|
226
221
|
# Log the error
|
|
227
222
|
Rails.logger.error { "Error in periodic task #{method}: #{e.message}" }
|
|
228
|
-
self.class::ExecutionTracker.track_error(@workflow, e)
|
|
223
|
+
self.class::ExecutionTracker.track_error(@workflow, e, execution_log: repetition_log)
|
|
229
224
|
|
|
230
225
|
# Handle retry logic for this specific repetition
|
|
231
226
|
if repetition_log.attempts < max_attempts
|
|
@@ -73,12 +73,10 @@ module ChronoForge
|
|
|
73
73
|
# - Marks as completed when wait period has elapsed
|
|
74
74
|
#
|
|
75
75
|
def wait(duration, name)
|
|
76
|
+
validate_step_name_segment!(name)
|
|
76
77
|
step_name = "wait$#{name}"
|
|
77
78
|
# Find or create execution log
|
|
78
|
-
execution_log =
|
|
79
|
-
workflow: @workflow,
|
|
80
|
-
step_name: step_name
|
|
81
|
-
) do |log|
|
|
79
|
+
execution_log = find_or_create_execution_log!(step_name) do |log|
|
|
82
80
|
log.started_at = Time.current
|
|
83
81
|
log.metadata = {
|
|
84
82
|
wait_until: duration.from_now
|
|
@@ -86,12 +86,10 @@ module ChronoForge
|
|
|
86
86
|
# - Records final result (true for success, :timed_out for timeout)
|
|
87
87
|
#
|
|
88
88
|
def wait_until(condition, timeout: 1.hour, check_interval: 15.minutes, retry_on: [])
|
|
89
|
+
validate_step_name_segment!(condition)
|
|
89
90
|
step_name = "wait_until$#{condition}"
|
|
90
91
|
# Find or create execution log
|
|
91
|
-
execution_log =
|
|
92
|
-
workflow: @workflow,
|
|
93
|
-
step_name: step_name
|
|
94
|
-
) do |log|
|
|
92
|
+
execution_log = find_or_create_execution_log!(step_name) do |log|
|
|
95
93
|
log.started_at = Time.current
|
|
96
94
|
log.metadata = {
|
|
97
95
|
timeout_at: timeout.from_now,
|
|
@@ -117,7 +115,7 @@ module ChronoForge
|
|
|
117
115
|
rescue => e
|
|
118
116
|
# Log the error
|
|
119
117
|
Rails.logger.error { "Error evaluating condition #{condition}: #{e.message}" }
|
|
120
|
-
self.class::ExecutionTracker.track_error(workflow, e)
|
|
118
|
+
self.class::ExecutionTracker.track_error(workflow, e, execution_log: execution_log)
|
|
121
119
|
|
|
122
120
|
# Optional retry logic
|
|
123
121
|
if retry_on.include?(e.class)
|
|
@@ -162,7 +160,11 @@ module ChronoForge
|
|
|
162
160
|
metadata: metadata.merge("result" => :timed_out)
|
|
163
161
|
)
|
|
164
162
|
Rails.logger.warn { "Timeout reached for condition '#{condition}'." }
|
|
165
|
-
|
|
163
|
+
# Log here (with step context) rather than relying on the workflow-level
|
|
164
|
+
# rescue, which no longer logs ExecutionFailedError.
|
|
165
|
+
error = WaitConditionNotMet.new("Condition '#{condition}' not met within timeout period")
|
|
166
|
+
self.class::ExecutionTracker.track_error(workflow, error, execution_log: execution_log)
|
|
167
|
+
raise error
|
|
166
168
|
end
|
|
167
169
|
|
|
168
170
|
# Reschedule with delay
|
|
@@ -49,24 +49,20 @@ module ChronoForge
|
|
|
49
49
|
#
|
|
50
50
|
def complete_workflow!
|
|
51
51
|
# Create an execution log for workflow completion
|
|
52
|
-
execution_log =
|
|
53
|
-
workflow: workflow,
|
|
54
|
-
step_name: "$workflow_completion$"
|
|
55
|
-
) do |log|
|
|
52
|
+
execution_log = find_or_create_execution_log!("$workflow_completion$") do |log|
|
|
56
53
|
log.started_at = Time.current
|
|
57
54
|
end
|
|
58
55
|
|
|
59
56
|
begin
|
|
60
|
-
execution_log.update!(
|
|
61
|
-
attempts: execution_log.attempts + 1,
|
|
62
|
-
last_executed_at: Time.current
|
|
63
|
-
)
|
|
64
|
-
|
|
65
57
|
workflow.completed_at = Time.current
|
|
66
58
|
workflow.completed!
|
|
67
59
|
|
|
68
|
-
# Mark execution log as completed
|
|
60
|
+
# Mark execution log as completed. Attempt tracking and the terminal
|
|
61
|
+
# state are written together: completion is not retried on an
|
|
62
|
+
# attempt-count basis, so there is no need for a separate pre-write.
|
|
69
63
|
execution_log.update!(
|
|
64
|
+
attempts: execution_log.attempts + 1,
|
|
65
|
+
last_executed_at: Time.current,
|
|
70
66
|
state: :completed,
|
|
71
67
|
completed_at: Time.current
|
|
72
68
|
)
|
|
@@ -142,10 +138,7 @@ module ChronoForge
|
|
|
142
138
|
#
|
|
143
139
|
def fail_workflow!(error_log)
|
|
144
140
|
# Create an execution log for workflow failure
|
|
145
|
-
execution_log =
|
|
146
|
-
workflow: workflow,
|
|
147
|
-
step_name: "$workflow_failure$#{error_log.id}"
|
|
148
|
-
) do |log|
|
|
141
|
+
execution_log = find_or_create_execution_log!("$workflow_failure$#{error_log.id}") do |log|
|
|
149
142
|
log.started_at = Time.current
|
|
150
143
|
log.metadata = {
|
|
151
144
|
error_log_id: error_log.id
|
|
@@ -153,15 +146,14 @@ module ChronoForge
|
|
|
153
146
|
end
|
|
154
147
|
|
|
155
148
|
begin
|
|
156
|
-
execution_log.update!(
|
|
157
|
-
attempts: execution_log.attempts + 1,
|
|
158
|
-
last_executed_at: Time.current
|
|
159
|
-
)
|
|
160
|
-
|
|
161
149
|
workflow.failed!
|
|
162
150
|
|
|
163
|
-
# Mark execution log as completed
|
|
151
|
+
# Mark execution log as completed. Attempt tracking and the terminal
|
|
152
|
+
# state are written together (failure handling is not retried on an
|
|
153
|
+
# attempt-count basis).
|
|
164
154
|
execution_log.update!(
|
|
155
|
+
attempts: execution_log.attempts + 1,
|
|
156
|
+
last_executed_at: Time.current,
|
|
165
157
|
state: :completed,
|
|
166
158
|
completed_at: Time.current
|
|
167
159
|
)
|
|
@@ -242,10 +234,9 @@ module ChronoForge
|
|
|
242
234
|
# - Original exception is re-raised after logging
|
|
243
235
|
#
|
|
244
236
|
def retry_workflow!
|
|
245
|
-
#
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
end
|
|
237
|
+
# Authoritative check at execution time (the record-level retry methods
|
|
238
|
+
# also check up front, but state may have changed since enqueue).
|
|
239
|
+
workflow.ensure_retryable!
|
|
249
240
|
|
|
250
241
|
# Create an execution log for workflow retry
|
|
251
242
|
execution_log = ExecutionLog.create!(
|
|
@@ -12,6 +12,12 @@ module ChronoForge
|
|
|
12
12
|
|
|
13
13
|
class WorkflowNotRetryableError < NotExecutableError; end
|
|
14
14
|
|
|
15
|
+
class InvalidStepName < NotExecutableError; end
|
|
16
|
+
|
|
17
|
+
# "$" separates the segments of a step name (e.g. "durably_repeat$name$ts").
|
|
18
|
+
# User-supplied names/methods must not contain it.
|
|
19
|
+
STEP_NAME_DELIMITER = "$"
|
|
20
|
+
|
|
15
21
|
include Methods
|
|
16
22
|
|
|
17
23
|
# Add class methods
|
|
@@ -34,13 +40,13 @@ module ChronoForge
|
|
|
34
40
|
end
|
|
35
41
|
|
|
36
42
|
# Add retry_now class method that calls perform_now with retry_workflow: true
|
|
37
|
-
def retry_now(key, **
|
|
38
|
-
perform_now(key, retry_workflow: true, **
|
|
43
|
+
def retry_now(key, **)
|
|
44
|
+
perform_now(key, retry_workflow: true, **)
|
|
39
45
|
end
|
|
40
46
|
|
|
41
47
|
# Add retry_later class method that calls perform_later with retry_workflow: true
|
|
42
|
-
def retry_later(key, **
|
|
43
|
-
perform_later(key, retry_workflow: true, **
|
|
48
|
+
def retry_later(key, **)
|
|
49
|
+
perform_later(key, retry_workflow: true, **)
|
|
44
50
|
end
|
|
45
51
|
end
|
|
46
52
|
end
|
|
@@ -74,9 +80,11 @@ module ChronoForge
|
|
|
74
80
|
|
|
75
81
|
# Mark as complete
|
|
76
82
|
complete_workflow!
|
|
77
|
-
rescue ExecutionFailedError
|
|
83
|
+
rescue ExecutionFailedError
|
|
84
|
+
# The step that raised this already logged the underlying cause (with its
|
|
85
|
+
# step/attempt context); ExecutionFailedError is control flow, not a new
|
|
86
|
+
# error, so re-logging it here would just duplicate the row.
|
|
78
87
|
Rails.logger.error { "ChronoForge:#{self.class}(#{key}) step execution failed" }
|
|
79
|
-
self.class::ExecutionTracker.track_error(workflow, e)
|
|
80
88
|
workflow.stalled!
|
|
81
89
|
nil
|
|
82
90
|
rescue HaltExecutionFlow
|
|
@@ -91,7 +99,7 @@ module ChronoForge
|
|
|
91
99
|
raise
|
|
92
100
|
rescue => e
|
|
93
101
|
Rails.logger.error { "ChronoForge:#{self.class}(#{key}) workflow execution failed" }
|
|
94
|
-
error_log = self.class::ExecutionTracker.track_error(workflow, e)
|
|
102
|
+
error_log = self.class::ExecutionTracker.track_error(workflow, e, attempt: attempt)
|
|
95
103
|
|
|
96
104
|
# Retry if applicable
|
|
97
105
|
if should_retry?(e, attempt)
|
|
@@ -110,17 +118,52 @@ module ChronoForge
|
|
|
110
118
|
private
|
|
111
119
|
|
|
112
120
|
def setup_workflow!(key, options, kwargs)
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
121
|
+
# SELECT-first: on every resume (the common case) the workflow already
|
|
122
|
+
# exists, so a plain lookup avoids an INSERT that would fail on the unique
|
|
123
|
+
# [job_class, key] index. create_or_find_by! is only reached on first-ever
|
|
124
|
+
# creation, where it also handles a concurrent insert race safely.
|
|
125
|
+
@workflow = Workflow.find_by(job_class: self.class.to_s, key: key) ||
|
|
126
|
+
Workflow.create_or_find_by!(job_class: self.class.to_s, key: key) do |workflow|
|
|
127
|
+
workflow.options = options
|
|
128
|
+
workflow.kwargs = kwargs
|
|
129
|
+
workflow.started_at = Time.current
|
|
130
|
+
end
|
|
118
131
|
end
|
|
119
132
|
|
|
120
133
|
def setup_context!
|
|
121
134
|
@context = Context.new(workflow)
|
|
122
135
|
end
|
|
123
136
|
|
|
137
|
+
# Idempotent, SELECT-first execution-log lookup.
|
|
138
|
+
#
|
|
139
|
+
# The engine replays the whole workflow body on every resume, so each durable
|
|
140
|
+
# step is looked up again every pass. A plain create_or_find_by! would INSERT
|
|
141
|
+
# first and fail on the unique index for the (overwhelmingly common) case
|
|
142
|
+
# where the step already exists — turning every replayed step into a wasted
|
|
143
|
+
# INSERT plus a burned sequence value. Looking up first means replays cost a
|
|
144
|
+
# single indexed SELECT.
|
|
145
|
+
#
|
|
146
|
+
# All lookups are by exact step_name (no method ever scans a workflow's logs),
|
|
147
|
+
# so a per-step lookup is also the right shape for durably_repeat workflows,
|
|
148
|
+
# which accumulate unbounded repetition logs: we touch only the rows we need,
|
|
149
|
+
# never the whole set. create_or_find_by! is used only on a miss, keeping
|
|
150
|
+
# creation safe if a lock takeover ever lets two executors race.
|
|
151
|
+
def find_or_create_execution_log!(step_name, &)
|
|
152
|
+
ExecutionLog.find_by(workflow: @workflow, step_name: step_name) ||
|
|
153
|
+
ExecutionLog.create_or_find_by!(workflow: @workflow, step_name: step_name, &)
|
|
154
|
+
end
|
|
155
|
+
|
|
156
|
+
# Guards the user-supplied portion of a step name (a custom name, method, or
|
|
157
|
+
# condition). The "$" separator is reserved for the framework's own segment
|
|
158
|
+
# structure, so a user value containing it would make step names ambiguous
|
|
159
|
+
# and corrupt the cleanup logic that parses them.
|
|
160
|
+
def validate_step_name_segment!(segment)
|
|
161
|
+
return unless segment.to_s.include?(STEP_NAME_DELIMITER)
|
|
162
|
+
|
|
163
|
+
raise InvalidStepName,
|
|
164
|
+
"ChronoForge step name may not contain '#{STEP_NAME_DELIMITER}' (reserved separator): #{segment.inspect}"
|
|
165
|
+
end
|
|
166
|
+
|
|
124
167
|
def should_retry?(error, attempt_count)
|
|
125
168
|
attempt_count < 3
|
|
126
169
|
end
|
data/lib/chrono_forge/version.rb
CHANGED
|
@@ -36,13 +36,36 @@ module ChronoForge
|
|
|
36
36
|
stalled
|
|
37
37
|
]
|
|
38
38
|
|
|
39
|
-
# Serialization for metadata
|
|
40
|
-
serialize :metadata, coder: JSON
|
|
41
|
-
|
|
42
39
|
def executable?
|
|
43
40
|
idle? || running?
|
|
44
41
|
end
|
|
45
42
|
|
|
43
|
+
# Only stalled or failed workflows can be re-executed.
|
|
44
|
+
def retryable?
|
|
45
|
+
stalled? || failed?
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
def ensure_retryable!
|
|
49
|
+
return if retryable?
|
|
50
|
+
|
|
51
|
+
raise Executor::WorkflowNotRetryableError,
|
|
52
|
+
"Cannot retry workflow(#{key}) in #{state} state. Only stalled or failed workflows can be retried."
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
# Re-execute this workflow from its record, without constantizing the job
|
|
56
|
+
# class or re-passing the key. Retryability is validated up front so a
|
|
57
|
+
# non-retryable workflow raises immediately rather than enqueuing a job that
|
|
58
|
+
# would fail in the worker.
|
|
59
|
+
def retry_now(**)
|
|
60
|
+
ensure_retryable!
|
|
61
|
+
job_klass.retry_now(key, **)
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def retry_later(**)
|
|
65
|
+
ensure_retryable!
|
|
66
|
+
job_klass.retry_later(key, **)
|
|
67
|
+
end
|
|
68
|
+
|
|
46
69
|
def job_klass
|
|
47
70
|
job_class.constantize
|
|
48
71
|
end
|
|
@@ -1,8 +1,10 @@
|
|
|
1
1
|
Description:
|
|
2
|
-
Installs ChronoForge
|
|
2
|
+
Installs ChronoForge by copying all of its migrations into your app.
|
|
3
|
+
Idempotent: migrations that already exist are skipped.
|
|
3
4
|
|
|
4
5
|
Example:
|
|
5
|
-
bin/rails g
|
|
6
|
+
bin/rails g chrono_forge:install
|
|
6
7
|
|
|
7
|
-
This will create
|
|
8
|
+
This will create migrations e.g:
|
|
8
9
|
20241221181505_install_chrono_forge.rb
|
|
10
|
+
20241221181506_add_chrono_forge_workflow_state_index.rb
|
|
@@ -1,24 +1,22 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require "rails/generators/active_record/migration"
|
|
4
|
+
require_relative "../migration_actions"
|
|
4
5
|
|
|
5
6
|
module ChronoForge
|
|
7
|
+
# Installs all ChronoForge migrations into a new application. Idempotent:
|
|
8
|
+
# migrations that already exist are skipped, so re-running is safe.
|
|
6
9
|
class InstallGenerator < Rails::Generators::Base
|
|
7
10
|
include ::ActiveRecord::Generators::Migration
|
|
11
|
+
include ChronoForge::Generators::MigrationActions
|
|
8
12
|
|
|
9
|
-
source_root File.expand_path("templates", __dir__)
|
|
13
|
+
source_root File.expand_path("../templates", __dir__)
|
|
10
14
|
|
|
11
15
|
def start
|
|
12
|
-
|
|
16
|
+
copy_chrono_forge_migrations
|
|
13
17
|
rescue => err
|
|
14
18
|
say "#{err.class}: #{err}\n#{err.backtrace.join("\n")}", :red
|
|
15
19
|
exit 1
|
|
16
20
|
end
|
|
17
|
-
|
|
18
|
-
private
|
|
19
|
-
|
|
20
|
-
def install_migrations
|
|
21
|
-
migration_template "install_chrono_forge.rb", "db/migrate/install_chrono_forge.rb"
|
|
22
|
-
end
|
|
23
21
|
end
|
|
24
22
|
end
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module ChronoForge
|
|
4
|
+
module Generators
|
|
5
|
+
# Shared migration-copy logic for the install and upgrade generators.
|
|
6
|
+
#
|
|
7
|
+
# Copying is idempotent: a migration whose name already exists in the host
|
|
8
|
+
# application's db/migrate is skipped, so it is safe to re-run either
|
|
9
|
+
# generator. `install` copies the full set (a fresh app has none yet);
|
|
10
|
+
# `upgrade` copies only the migrations a previously-installed app is missing.
|
|
11
|
+
# Both share this method — the difference is purely which migrations already
|
|
12
|
+
# exist in the target app.
|
|
13
|
+
#
|
|
14
|
+
# MIGRATIONS is listed in application order; copying preserves that order
|
|
15
|
+
# because each migration_template assigns the next sequential version number.
|
|
16
|
+
module MigrationActions
|
|
17
|
+
MIGRATIONS = %w[
|
|
18
|
+
install_chrono_forge
|
|
19
|
+
add_chrono_forge_workflow_state_index
|
|
20
|
+
add_chrono_forge_error_log_step_context
|
|
21
|
+
].freeze
|
|
22
|
+
|
|
23
|
+
def copy_chrono_forge_migrations
|
|
24
|
+
MIGRATIONS.each do |name|
|
|
25
|
+
if chrono_forge_migration_exists?(name)
|
|
26
|
+
say_status :skip, "#{name} (migration already exists)", :yellow
|
|
27
|
+
else
|
|
28
|
+
migration_template "#{name}.rb", "db/migrate/#{name}.rb"
|
|
29
|
+
end
|
|
30
|
+
end
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def chrono_forge_migration_exists?(name)
|
|
34
|
+
migrate_dir = File.join(destination_root, "db", "migrate")
|
|
35
|
+
Dir.glob(File.join(migrate_dir, "*_#{name}.rb")).any?
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
end
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Adds step context to error logs so each error can be attributed to the step
|
|
4
|
+
# and attempt it came from — making error logs orderable and correlatable when
|
|
5
|
+
# tailing a workflow, instead of an undifferentiated stream. Both columns are
|
|
6
|
+
# nullable (a workflow-level error has no step), so this is a safe additive
|
|
7
|
+
# change with no table rewrite.
|
|
8
|
+
class AddChronoForgeErrorLogStepContext < ActiveRecord::Migration[7.1]
|
|
9
|
+
def change
|
|
10
|
+
add_column :chrono_forge_error_logs, :step_name, :string, if_not_exists: true
|
|
11
|
+
add_column :chrono_forge_error_logs, :attempt, :integer, if_not_exists: true
|
|
12
|
+
end
|
|
13
|
+
end
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Adds a composite [state, completed_at] index to chrono_forge_workflows. This
|
|
4
|
+
# supports state-based monitoring (stalled/failed dashboards) and the retention
|
|
5
|
+
# scan ChronoForge::Cleanup runs over completed workflows (the high-volume
|
|
6
|
+
# terminal state), which filters by completed_at. The state prefix also serves
|
|
7
|
+
# the smaller failed-workflow scan.
|
|
8
|
+
#
|
|
9
|
+
# Shipped as a standalone migration (rather than folded into the install
|
|
10
|
+
# migration) so applications created with an earlier version of ChronoForge can
|
|
11
|
+
# pick it up via `rails generate chrono_forge:upgrade`.
|
|
12
|
+
class AddChronoForgeWorkflowStateIndex < ActiveRecord::Migration[7.1]
|
|
13
|
+
# On PostgreSQL the index is built CONCURRENTLY so it does not lock the table
|
|
14
|
+
# against writes, which also keeps strong_migrations satisfied. Concurrent
|
|
15
|
+
# index builds cannot run inside a transaction.
|
|
16
|
+
disable_ddl_transaction!
|
|
17
|
+
|
|
18
|
+
def change
|
|
19
|
+
add_index :chrono_forge_workflows, %i[state completed_at],
|
|
20
|
+
if_not_exists: true,
|
|
21
|
+
**chrono_forge_index_algorithm
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
private
|
|
25
|
+
|
|
26
|
+
def chrono_forge_index_algorithm
|
|
27
|
+
if connection.adapter_name.to_s.downcase.include?("postgresql")
|
|
28
|
+
{algorithm: :concurrently}
|
|
29
|
+
else
|
|
30
|
+
{}
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
end
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
Description:
|
|
2
|
+
Upgrades an existing ChronoForge installation to the current schema.
|
|
3
|
+
|
|
4
|
+
New applications use `chrono_forge:install`. Applications that installed an
|
|
5
|
+
earlier version run this to add additive schema changes (currently the
|
|
6
|
+
chrono_forge_workflows [state, completed_at] index). The generated migration
|
|
7
|
+
is idempotent and safe to run even if the index already exists.
|
|
8
|
+
|
|
9
|
+
Example:
|
|
10
|
+
bin/rails g chrono_forge:upgrade
|
|
11
|
+
|
|
12
|
+
This will create a new migration e.g:
|
|
13
|
+
20250603181505_add_chrono_forge_workflow_state_index.rb
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "rails/generators/active_record/migration"
|
|
4
|
+
require_relative "../migration_actions"
|
|
5
|
+
|
|
6
|
+
module ChronoForge
|
|
7
|
+
# Brings an existing ChronoForge installation up to the current schema by
|
|
8
|
+
# copying any migrations the application does not already have. Applications
|
|
9
|
+
# created with `chrono_forge:install` on the current version already have
|
|
10
|
+
# everything; older installs pick up the additive migrations (currently the
|
|
11
|
+
# chrono_forge_workflows [state, completed_at] index).
|
|
12
|
+
#
|
|
13
|
+
# rails generate chrono_forge:upgrade
|
|
14
|
+
# rails db:migrate
|
|
15
|
+
class UpgradeGenerator < Rails::Generators::Base
|
|
16
|
+
include ::ActiveRecord::Generators::Migration
|
|
17
|
+
include ChronoForge::Generators::MigrationActions
|
|
18
|
+
|
|
19
|
+
source_root File.expand_path("../templates", __dir__)
|
|
20
|
+
|
|
21
|
+
def start
|
|
22
|
+
copy_chrono_forge_migrations
|
|
23
|
+
rescue => err
|
|
24
|
+
say "#{err.class}: #{err}\n#{err.backtrace.join("\n")}", :red
|
|
25
|
+
exit 1
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: chrono_forge
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.9.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Stefan Froelich
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date:
|
|
11
|
+
date: 2026-06-03 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: activerecord
|
|
@@ -185,6 +185,8 @@ files:
|
|
|
185
185
|
- gemfiles/rails_7.1.gemfile
|
|
186
186
|
- gemfiles/rails_7.1.gemfile.lock
|
|
187
187
|
- lib/chrono_forge.rb
|
|
188
|
+
- lib/chrono_forge/cleanup.rb
|
|
189
|
+
- lib/chrono_forge/cleanup_job.rb
|
|
188
190
|
- lib/chrono_forge/error_log.rb
|
|
189
191
|
- lib/chrono_forge/execution_log.rb
|
|
190
192
|
- lib/chrono_forge/executor.rb
|
|
@@ -203,7 +205,12 @@ files:
|
|
|
203
205
|
- lib/chrono_forge/workflow.rb
|
|
204
206
|
- lib/generators/chrono_forge/install/USAGE
|
|
205
207
|
- lib/generators/chrono_forge/install/install_generator.rb
|
|
206
|
-
- lib/generators/chrono_forge/
|
|
208
|
+
- lib/generators/chrono_forge/migration_actions.rb
|
|
209
|
+
- lib/generators/chrono_forge/templates/add_chrono_forge_error_log_step_context.rb
|
|
210
|
+
- lib/generators/chrono_forge/templates/add_chrono_forge_workflow_state_index.rb
|
|
211
|
+
- lib/generators/chrono_forge/templates/install_chrono_forge.rb
|
|
212
|
+
- lib/generators/chrono_forge/upgrade/USAGE
|
|
213
|
+
- lib/generators/chrono_forge/upgrade/upgrade_generator.rb
|
|
207
214
|
- sig/chrono_forge.rbs
|
|
208
215
|
homepage: https://github.com/radioactive-labs/chrono_forge
|
|
209
216
|
licenses:
|
|
File without changes
|