hekenga 1.1.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -0
- data/CLAUDE.md +60 -0
- data/README.md +127 -8
- data/docker-compose.yml +2 -2
- data/lib/hekenga/base_iterator.rb +24 -0
- data/lib/hekenga/document_task.rb +6 -1
- data/lib/hekenga/document_task_executor.rb +5 -1
- data/lib/hekenga/dsl/document_task.rb +4 -0
- data/lib/hekenga/id_iterator.rb +34 -0
- data/lib/hekenga/migration.rb +4 -3
- data/lib/hekenga/mongoid_iterator.rb +8 -0
- data/lib/hekenga/parallel_task.rb +35 -8
- data/lib/hekenga/scaffold.rb +1 -0
- data/lib/hekenga/version.rb +1 -1
- metadata +6 -3
- data/lib/hekenga/iterator.rb +0 -26
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: d5a372f23cc2fe1751eb08e07e40705b0361fc609b072e14226b72819cfe9647
|
|
4
|
+
data.tar.gz: 99228c0f076660abe23c92f37c55509b156810fc0a741764b7872d4a0c597263
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e2dfea01ce0c1c17fc51ab550e1805653a255d7c65a8b8424c7a1d6e646d59b9bc70fc45cfcf558d1fd7a6df81e5a2a8230a240e8c75d578736cdf19f4e2e290
|
|
7
|
+
data.tar.gz: 0af8e27a9f1b4c7ae490788f64457a67fae9a3fb9459821b8066bca61671ae30463971e9ec277d01ba358817992260281e5dceaaab676f9093dc05a941a520ce
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,25 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## v2.1.0
|
|
4
|
+
|
|
5
|
+
- `per_document` task `scope` will no longer let you specify `.only` or
|
|
6
|
+
`.without` as it could potentially cause data loss.
|
|
7
|
+
- `per_document` task `scope` now correctly works with `.includes` even
|
|
8
|
+
in parallel execution mode.
|
|
9
|
+
|
|
10
|
+
## v2.0.0
|
|
11
|
+
|
|
12
|
+
- (breaking) `Hekenga::Iterator` has been replaced by `Hekenga::IdIterator`. If any
|
|
13
|
+
selector or sort is set on a document task migration scope, it no longer forces an
|
|
14
|
+
ascending ID sort. This should help to prevent index misses, though there is a
|
|
15
|
+
tradeoff that documents being concurrently updated may be skipped or
|
|
16
|
+
processed multiple times. Hekenga tries to guard against processing multiple
|
|
17
|
+
times. Manually specifying an `asc(:_id)` on your scope will continue to
|
|
18
|
+
process documents in ID order.
|
|
19
|
+
- Document tasks now support a new option, `cursor_timeout`. This is the maximum
|
|
20
|
+
time a document task's `scope` can be iterated and queue jobs within. The
|
|
21
|
+
default is one day.
|
|
22
|
+
|
|
3
23
|
## v1.1.0
|
|
4
24
|
|
|
5
25
|
- `setup` is now passed the current batch of documents so it can be used to
|
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
Hekenga is a Ruby gem providing a migration framework for MongoDB (via Mongoid). It supports sequential and parallel document processing via ActiveJob, with error recovery, validation tracking, and a Thor-based CLI.
|
|
8
|
+
|
|
9
|
+
## Common Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
# Run full test suite (requires MongoDB - see docker-compose.yml)
|
|
13
|
+
rake spec
|
|
14
|
+
|
|
15
|
+
# Run a single spec file
|
|
16
|
+
rake spec SPEC=spec/hekenga/document_task_spec.rb
|
|
17
|
+
|
|
18
|
+
# Install gem locally
|
|
19
|
+
bundle exec rake install
|
|
20
|
+
|
|
21
|
+
# Interactive console with gem loaded
|
|
22
|
+
bin/console
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Architecture
|
|
26
|
+
|
|
27
|
+
### Migration Flow
|
|
28
|
+
|
|
29
|
+
```
|
|
30
|
+
Migration.perform! → MasterProcess.run! → launches tasks in threads
|
|
31
|
+
SimpleTask: executes up/down blocks directly
|
|
32
|
+
DocumentTask: iterates documents → batch → execute → write (sequential)
|
|
33
|
+
ParallelTask: splits into ID batches → enqueues ParallelJob per batch (via ActiveJob)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### Key Components
|
|
37
|
+
|
|
38
|
+
- **`Hekenga::Migration`** — main migration class, orchestrates tasks
|
|
39
|
+
- **`Hekenga::MasterProcess`** — launches tasks, manages execution/recovery/progress
|
|
40
|
+
- **`Hekenga::DSL::*`** — fluent DSL for defining migrations (`DSL::Migration`, `DSL::SimpleTask`, `DSL::DocumentTask`)
|
|
41
|
+
- **`Hekenga::DocumentTaskExecutor`** — core document processing: filter → up block → validate → write
|
|
42
|
+
- **`Hekenga::ParallelTask`** / **`Hekenga::ParallelJob`** — parallel execution via ActiveJob
|
|
43
|
+
- **`Hekenga::DocumentTaskRecord`** — Mongoid doc tracking parallel task progress
|
|
44
|
+
- **`Hekenga::Log`** — Mongoid doc tracking migration/task status (`:naught`, `:running`, `:complete`, `:failed`, `:skipped`)
|
|
45
|
+
- **`Hekenga::Failure::*`** — error/validation/write/cancelled failure tracking subclasses
|
|
46
|
+
- **`Hekenga::IdIterator`** / **`Hekenga::MongoidIterator`** — efficient document iteration for parallel vs sequential paths
|
|
47
|
+
|
|
48
|
+
### Task Types
|
|
49
|
+
|
|
50
|
+
- **SimpleTask** — one-off up/down blocks, no document iteration
|
|
51
|
+
- **DocumentTask** — per-document processing with scope, filter, setup, up, down, after blocks; supports `parallel!`, `timeless!`, `always_write!`, `use_transaction!`, configurable write strategies (`:update` vs `:delete_then_insert`)
|
|
52
|
+
|
|
53
|
+
### Configuration
|
|
54
|
+
|
|
55
|
+
Via `Hekenga.configure` block — sets migration directory and report frequency. Thread-safe registry tracks all migrations.
|
|
56
|
+
|
|
57
|
+
## Dependencies
|
|
58
|
+
|
|
59
|
+
- **mongoid** (>= 6), **activejob** (>= 5), **thor** (1.2.1)
|
|
60
|
+
- Test: **rspec** (~> 3.0), **database_cleaner-mongoid** (~> 2.0), **pry**
|
data/README.md
CHANGED
|
@@ -1,10 +1,7 @@
|
|
|
1
1
|
# Hekenga
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
processing via ActiveJob, chained jobs and error recovery.
|
|
5
|
-
|
|
6
|
-
**Note that this gem is currently in pre-alpha - assume most things have a high
|
|
7
|
-
chance of being broken.**
|
|
3
|
+
A migration framework for MongoDB (via Mongoid) that supports parallel document
|
|
4
|
+
processing via ActiveJob, chained jobs, and error recovery.
|
|
8
5
|
|
|
9
6
|
## Installation
|
|
10
7
|
|
|
@@ -22,13 +19,135 @@ Or install it yourself as:
|
|
|
22
19
|
|
|
23
20
|
$ gem install hekenga
|
|
24
21
|
|
|
22
|
+
## Configuration
|
|
23
|
+
|
|
24
|
+
```ruby
|
|
25
|
+
Hekenga.configure do |config|
|
|
26
|
+
config.dir = ["db", "hekenga"] # where migration files live (relative to root)
|
|
27
|
+
config.root = Dir.pwd # application root
|
|
28
|
+
end
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Migrations are stored as Ruby files in the configured directory (default: `db/hekenga/`).
|
|
32
|
+
|
|
25
33
|
## Usage
|
|
26
34
|
|
|
27
|
-
CLI
|
|
35
|
+
### CLI
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
$ hekenga help # Show all available commands
|
|
39
|
+
$ hekenga generate <description> # Generate a new migration scaffold
|
|
40
|
+
$ hekenga status # Show status of all migrations
|
|
41
|
+
$ hekenga run_all! # Run all pending migrations in date order
|
|
42
|
+
$ hekenga run! <path_or_pkey> # Run a specific migration
|
|
43
|
+
$ hekenga run! <path_or_pkey> --test # Dry run (no writes persisted)
|
|
44
|
+
$ hekenga run! <path_or_pkey> --clear # Clear logs before running
|
|
45
|
+
$ hekenga recover! <path_or_pkey> # Re-process failed/invalid records
|
|
46
|
+
$ hekenga cancel # Cancel all active migrations
|
|
47
|
+
$ hekenga skip <path_or_pkey> # Mark a migration as skipped
|
|
48
|
+
$ hekenga clear! <path_or_pkey> # Remove all logs/failures for a migration
|
|
49
|
+
$ hekenga cleanup # Remove all failure logs
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Writing Migrations
|
|
53
|
+
|
|
54
|
+
Generate a migration scaffold:
|
|
55
|
+
|
|
56
|
+
$ hekenga generate "Add default role to users"
|
|
57
|
+
|
|
58
|
+
#### Simple Tasks
|
|
59
|
+
|
|
60
|
+
Simple tasks run arbitrary code once. Use `actual?` and `test?` to check execution mode.
|
|
61
|
+
|
|
62
|
+
```ruby
|
|
63
|
+
Hekenga.migration do
|
|
64
|
+
description "Backfill analytics collection"
|
|
65
|
+
created "2024-01-15 10:00"
|
|
66
|
+
|
|
67
|
+
task "Create indexes" do
|
|
68
|
+
up do
|
|
69
|
+
Analytics.create_indexes if actual?
|
|
70
|
+
end
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
#### Document Tasks
|
|
76
|
+
|
|
77
|
+
Document tasks iterate over a Mongoid scope and process each document in batches.
|
|
78
|
+
|
|
79
|
+
```ruby
|
|
80
|
+
Hekenga.migration do
|
|
81
|
+
description "Normalize user emails"
|
|
82
|
+
created "2024-01-15 10:00"
|
|
83
|
+
batch_size 100 # default batch size for all tasks in this migration
|
|
84
|
+
|
|
85
|
+
per_document "Downcase emails" do
|
|
86
|
+
scope User.all
|
|
87
|
+
|
|
88
|
+
# Called once per batch; instance variables are shared with filter/up/after
|
|
89
|
+
setup do |docs|
|
|
90
|
+
@domain_map = ExternalService.load_domains
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
# Return false to skip a document
|
|
94
|
+
filter do |doc|
|
|
95
|
+
doc.email.present?
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
# Mutate the document in place — Hekenga handles persistence
|
|
99
|
+
up do |doc|
|
|
100
|
+
doc.email = doc.email.downcase
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
# Called once per batch with the successfully written documents
|
|
104
|
+
after do |docs|
|
|
105
|
+
AuditLog.record(docs.map(&:id))
|
|
106
|
+
end
|
|
107
|
+
end
|
|
108
|
+
end
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
#### Document Task Options
|
|
112
|
+
|
|
113
|
+
```ruby
|
|
114
|
+
per_document "Process records" do
|
|
115
|
+
scope MyModel.where(active: true)
|
|
116
|
+
|
|
117
|
+
parallel! # Process batches in parallel via ActiveJob
|
|
118
|
+
timeless! # Don't update Mongoid timestamps
|
|
119
|
+
always_write! # Write even if the document didn't change
|
|
120
|
+
skip_prepare! # Skip Mongoid callbacks on load
|
|
121
|
+
use_transaction! # Wrap each batch in a MongoDB transaction
|
|
122
|
+
batch_size 50 # Override migration-level batch size
|
|
123
|
+
write_strategy :update # :update (default) or :delete_then_insert
|
|
124
|
+
cursor_timeout 86_400 # Max cursor lifetime in seconds (default: 1 day)
|
|
125
|
+
|
|
126
|
+
up do |doc|
|
|
127
|
+
doc.status = "migrated"
|
|
128
|
+
end
|
|
129
|
+
end
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### Test Mode
|
|
133
|
+
|
|
134
|
+
Run a migration without persisting changes:
|
|
135
|
+
|
|
136
|
+
```ruby
|
|
137
|
+
migration = Hekenga.find_migration("2024-01-15-add-default-role-to-users")
|
|
138
|
+
migration.test_mode!
|
|
139
|
+
migration.perform!
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Or via the CLI:
|
|
143
|
+
|
|
144
|
+
$ hekenga run! <path_or_pkey> --test
|
|
145
|
+
|
|
146
|
+
### Recovery
|
|
28
147
|
|
|
29
|
-
|
|
148
|
+
When a migration fails (due to errors, invalid records, or write failures), Hekenga logs the failures and marks the migration as failed. You can re-process only the failed records:
|
|
30
149
|
|
|
31
|
-
|
|
150
|
+
$ hekenga recover! <path_or_pkey>
|
|
32
151
|
|
|
33
152
|
## Development
|
|
34
153
|
|
data/docker-compose.yml
CHANGED
|
@@ -8,7 +8,7 @@ networks:
|
|
|
8
8
|
|
|
9
9
|
services:
|
|
10
10
|
mongo:
|
|
11
|
-
image: mongo:
|
|
11
|
+
image: mongo:6
|
|
12
12
|
command: ["--replSet", "rs0", "--bind_ip", "localhost,mongo"]
|
|
13
13
|
volumes:
|
|
14
14
|
- mongo:/data/db
|
|
@@ -18,7 +18,7 @@ services:
|
|
|
18
18
|
- hekenga-net
|
|
19
19
|
|
|
20
20
|
mongosetup:
|
|
21
|
-
image: mongo:
|
|
21
|
+
image: mongo:6
|
|
22
22
|
depends_on:
|
|
23
23
|
- mongo
|
|
24
24
|
restart: "no"
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
module Hekenga
|
|
2
|
+
class BaseIterator
|
|
3
|
+
include Enumerable
|
|
4
|
+
DEFAULT_TIMEOUT = 86_400 # 1 day in seconds
|
|
5
|
+
|
|
6
|
+
attr_reader :cursor_timeout
|
|
7
|
+
|
|
8
|
+
def initialize(scope:, cursor_timeout: DEFAULT_TIMEOUT)
|
|
9
|
+
@scope = scope
|
|
10
|
+
@cursor_timeout = cursor_timeout
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
private
|
|
14
|
+
|
|
15
|
+
def iteration_scope
|
|
16
|
+
if @scope.selector.blank? && @scope.options.blank?
|
|
17
|
+
# Apply a default _id sort, it works the best
|
|
18
|
+
@scope.asc(:_id)
|
|
19
|
+
else
|
|
20
|
+
@scope
|
|
21
|
+
end.max_time_ms(cursor_timeout * 1000) # convert to ms
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
end
|
|
@@ -1,8 +1,9 @@
|
|
|
1
1
|
require 'hekenga/irreversible'
|
|
2
|
+
require 'hekenga/base_iterator'
|
|
2
3
|
module Hekenga
|
|
3
4
|
class DocumentTask
|
|
4
5
|
attr_reader :ups, :downs, :setups, :filters, :after_callbacks
|
|
5
|
-
attr_accessor :parallel, :scope, :timeless, :batch_size
|
|
6
|
+
attr_accessor :parallel, :scope, :timeless, :batch_size, :cursor_timeout
|
|
6
7
|
attr_accessor :description, :invalid_strategy, :skip_prepare, :write_strategy
|
|
7
8
|
attr_accessor :always_write, :use_transaction
|
|
8
9
|
|
|
@@ -18,10 +19,14 @@ module Hekenga
|
|
|
18
19
|
@batch_size = nil
|
|
19
20
|
@always_write = false
|
|
20
21
|
@use_transaction = false
|
|
22
|
+
@cursor_timeout = Hekenga::BaseIterator::DEFAULT_TIMEOUT
|
|
21
23
|
end
|
|
22
24
|
|
|
23
25
|
def validate!
|
|
24
26
|
raise Hekenga::Invalid.new(self, :ups, "missing") unless ups.any?
|
|
27
|
+
if scope&.options&.key?(:fields)
|
|
28
|
+
raise Hekenga::Invalid.new(self, :scope, "uses .only() or .without() which would cause data loss with replace_one")
|
|
29
|
+
end
|
|
25
30
|
end
|
|
26
31
|
|
|
27
32
|
def up!(context, document)
|
|
@@ -59,7 +59,11 @@ module Hekenga
|
|
|
59
59
|
end
|
|
60
60
|
|
|
61
61
|
def record_scope
|
|
62
|
-
task.scope.klass.unscoped.in(_id: task_record.ids)
|
|
62
|
+
scope = task.scope.klass.unscoped.in(_id: task_record.ids)
|
|
63
|
+
if task.scope.inclusions.any?
|
|
64
|
+
scope = scope.includes(*task.scope.inclusions.map(&:name))
|
|
65
|
+
end
|
|
66
|
+
scope
|
|
63
67
|
end
|
|
64
68
|
|
|
65
69
|
def records
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
require "hekenga/base_iterator"
|
|
2
|
+
module Hekenga
|
|
3
|
+
class IdIterator < BaseIterator
|
|
4
|
+
DEFAULT_ID = "_id".freeze
|
|
5
|
+
|
|
6
|
+
attr_reader :id_property
|
|
7
|
+
|
|
8
|
+
def initialize(id_property: DEFAULT_ID, **kwargs)
|
|
9
|
+
super(**kwargs)
|
|
10
|
+
@id_property = id_property
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
def each
|
|
14
|
+
with_view do |view|
|
|
15
|
+
view.each do |doc|
|
|
16
|
+
yield doc[id_property]
|
|
17
|
+
end
|
|
18
|
+
end
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
private
|
|
22
|
+
|
|
23
|
+
def with_view
|
|
24
|
+
view = iteration_scope.view
|
|
25
|
+
yield view
|
|
26
|
+
ensure
|
|
27
|
+
view.close_query
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def iteration_scope
|
|
31
|
+
super.only(id_property)
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
end
|
data/lib/hekenga/migration.rb
CHANGED
|
@@ -2,6 +2,7 @@ require 'hekenga/invalid'
|
|
|
2
2
|
require 'hekenga/context'
|
|
3
3
|
require 'hekenga/parallel_job'
|
|
4
4
|
require 'hekenga/parallel_task'
|
|
5
|
+
require 'hekenga/mongoid_iterator'
|
|
5
6
|
require 'hekenga/master_process'
|
|
6
7
|
require 'hekenga/document_task_record'
|
|
7
8
|
require 'hekenga/document_task_executor'
|
|
@@ -132,18 +133,18 @@ module Hekenga
|
|
|
132
133
|
records = []
|
|
133
134
|
task_records(task_idx).delete_all unless recover
|
|
134
135
|
executor_key = BSON::ObjectId.new
|
|
135
|
-
task.scope.
|
|
136
|
+
Hekenga::MongoidIterator.new(scope: task.scope, cursor_timeout: task.cursor_timeout).each do |record|
|
|
136
137
|
records.push(record)
|
|
137
138
|
next unless records.length == (task.batch_size || batch_size)
|
|
138
139
|
|
|
139
|
-
records = filter_out_processed(task, task_idx, records)
|
|
140
|
+
records = filter_out_processed(task, task_idx, records)
|
|
140
141
|
next unless records.length == (task.batch_size || batch_size)
|
|
141
142
|
|
|
142
143
|
execute_document_task(task_idx, executor_key, records)
|
|
143
144
|
records = []
|
|
144
145
|
return if log.cancel
|
|
145
146
|
end
|
|
146
|
-
records = filter_out_processed(task, task_idx, records)
|
|
147
|
+
records = filter_out_processed(task, task_idx, records)
|
|
147
148
|
execute_document_task(task_idx, executor_key, records) if records.any?
|
|
148
149
|
return if log.cancel
|
|
149
150
|
log_done!
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
require 'hekenga/
|
|
1
|
+
require 'hekenga/id_iterator'
|
|
2
2
|
require 'hekenga/document_task_executor'
|
|
3
3
|
require 'hekenga/task_splitter'
|
|
4
4
|
|
|
@@ -15,13 +15,13 @@ module Hekenga
|
|
|
15
15
|
|
|
16
16
|
def start!
|
|
17
17
|
clear_task_records!
|
|
18
|
-
|
|
18
|
+
regenerate_executor_key
|
|
19
19
|
generate_for_scope(task.scope)
|
|
20
20
|
check_for_completion!
|
|
21
21
|
end
|
|
22
22
|
|
|
23
23
|
def resume!
|
|
24
|
-
|
|
24
|
+
regenerate_executor_key
|
|
25
25
|
task_records.set(executor_key: @executor_key)
|
|
26
26
|
queue_jobs!(task_records.incomplete)
|
|
27
27
|
generate_new_records!
|
|
@@ -41,16 +41,43 @@ module Hekenga
|
|
|
41
41
|
|
|
42
42
|
private
|
|
43
43
|
|
|
44
|
+
def regenerate_executor_key
|
|
45
|
+
@executor_key = BSON::ObjectId.new
|
|
46
|
+
end
|
|
47
|
+
|
|
44
48
|
def generate_for_scope(scope)
|
|
45
|
-
Hekenga::
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
+
Hekenga::IdIterator.new(
|
|
50
|
+
scope: scope,
|
|
51
|
+
cursor_timeout: task.cursor_timeout
|
|
52
|
+
# Batch Batches of IDs
|
|
53
|
+
).each_slice(batch_size).each_slice(enqueue_size) do |id_block|
|
|
54
|
+
sanitize_id_block!(id_block)
|
|
55
|
+
task_records = id_block.reject(&:empty?).map(&method(:generate_task_record!))
|
|
49
56
|
write_task_records!(task_records)
|
|
50
57
|
queue_jobs!(task_records)
|
|
51
58
|
end
|
|
52
59
|
end
|
|
53
60
|
|
|
61
|
+
def enqueue_size
|
|
62
|
+
500 # task records written + enqueued at a time
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
def sanitize_id_block!(id_block)
|
|
66
|
+
return if task.scope.options.blank? && task.scope.selector.blank?
|
|
67
|
+
|
|
68
|
+
# Custom ordering on cursor with parallel updates may result in the same
|
|
69
|
+
# ID getting yielded into the migration multiple times. Detect this +
|
|
70
|
+
# remove
|
|
71
|
+
doubleups = task_records.in(ids: id_block.flatten).pluck(:ids).flatten.to_set
|
|
72
|
+
return if doubleups.empty?
|
|
73
|
+
|
|
74
|
+
id_block.each do |id_slice|
|
|
75
|
+
id_slice.reject! do |id|
|
|
76
|
+
doubleups.include?(id)
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
|
|
54
81
|
def generate_new_records!
|
|
55
82
|
last_record = task_records.desc(:_id).first
|
|
56
83
|
last_id = last_record&.ids&.last
|
|
@@ -83,7 +110,7 @@ module Hekenga
|
|
|
83
110
|
migration.task_records(task_idx)
|
|
84
111
|
end
|
|
85
112
|
|
|
86
|
-
def
|
|
113
|
+
def generate_task_record!(id_slice)
|
|
87
114
|
Hekenga::DocumentTaskRecord.new(
|
|
88
115
|
migration_key: migration.to_key,
|
|
89
116
|
task_idx: task_idx,
|
data/lib/hekenga/scaffold.rb
CHANGED
|
@@ -48,6 +48,7 @@ module Hekenga
|
|
|
48
48
|
# #skip_prepare!
|
|
49
49
|
# #batch_size 25
|
|
50
50
|
# #write_strategy :update # :delete_then_insert
|
|
51
|
+
# #cursor_timeout 86_400 # max allowed time for the cursor to survive, in seconds
|
|
51
52
|
#
|
|
52
53
|
# # Called once per batch, instance variables will be accessible
|
|
53
54
|
# # in the filter, up and after blocks
|
data/lib/hekenga/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: hekenga
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 2.1.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Tapio Saarinen
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date:
|
|
11
|
+
date: 2026-04-23 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: bundler
|
|
@@ -148,6 +148,7 @@ files:
|
|
|
148
148
|
- ".rspec"
|
|
149
149
|
- ".travis.yml"
|
|
150
150
|
- CHANGELOG.md
|
|
151
|
+
- CLAUDE.md
|
|
151
152
|
- Gemfile
|
|
152
153
|
- README.md
|
|
153
154
|
- Rakefile
|
|
@@ -159,6 +160,7 @@ files:
|
|
|
159
160
|
- hekenga.gemspec
|
|
160
161
|
- lib/hekenga.rb
|
|
161
162
|
- lib/hekenga/base_error.rb
|
|
163
|
+
- lib/hekenga/base_iterator.rb
|
|
162
164
|
- lib/hekenga/config.rb
|
|
163
165
|
- lib/hekenga/context.rb
|
|
164
166
|
- lib/hekenga/document_task.rb
|
|
@@ -173,12 +175,13 @@ files:
|
|
|
173
175
|
- lib/hekenga/failure/error.rb
|
|
174
176
|
- lib/hekenga/failure/validation.rb
|
|
175
177
|
- lib/hekenga/failure/write.rb
|
|
178
|
+
- lib/hekenga/id_iterator.rb
|
|
176
179
|
- lib/hekenga/invalid.rb
|
|
177
180
|
- lib/hekenga/irreversible.rb
|
|
178
|
-
- lib/hekenga/iterator.rb
|
|
179
181
|
- lib/hekenga/log.rb
|
|
180
182
|
- lib/hekenga/master_process.rb
|
|
181
183
|
- lib/hekenga/migration.rb
|
|
184
|
+
- lib/hekenga/mongoid_iterator.rb
|
|
182
185
|
- lib/hekenga/parallel_job.rb
|
|
183
186
|
- lib/hekenga/parallel_task.rb
|
|
184
187
|
- lib/hekenga/scaffold.rb
|
data/lib/hekenga/iterator.rb
DELETED
|
@@ -1,26 +0,0 @@
|
|
|
1
|
-
module Hekenga
|
|
2
|
-
class Iterator
|
|
3
|
-
include Enumerable
|
|
4
|
-
|
|
5
|
-
SMALLEST_ID = BSON::ObjectId.from_string('0'*24)
|
|
6
|
-
|
|
7
|
-
attr_reader :scope, :size
|
|
8
|
-
|
|
9
|
-
def initialize(scope, size:)
|
|
10
|
-
@scope = scope
|
|
11
|
-
@size = size
|
|
12
|
-
end
|
|
13
|
-
|
|
14
|
-
def each(&block)
|
|
15
|
-
current_id = SMALLEST_ID
|
|
16
|
-
base_scope = scope.asc(:_id).limit(size)
|
|
17
|
-
|
|
18
|
-
loop do
|
|
19
|
-
ids = base_scope.and(_id: {'$gt': current_id}).pluck(:_id)
|
|
20
|
-
break if ids.empty?
|
|
21
|
-
yield ids
|
|
22
|
-
current_id = ids.sort.last
|
|
23
|
-
end
|
|
24
|
-
end
|
|
25
|
-
end
|
|
26
|
-
end
|