schema-tools 1.0.5 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d7eb15996fbabd0923cf28e8ecd2142a28c3af1fb7ce0373220d64470f9f2aee
4
- data.tar.gz: 6806c8094271fb77e1e61c4bb98f3efc7d670fb40652f75c642f30445ee19adc
3
+ metadata.gz: 3a977e1d5ae35640094087dff8eb1b1e5481db2b6bc0d35315f219fea02210d6
4
+ data.tar.gz: d8a40fbc727ede8614d607a22ca9aa830b32254598f8900d8f0e0b57fc315589
5
5
  SHA512:
6
- metadata.gz: 661f85da42d258a77599d4e4fe4165d8db11d9b235cedc7af77d1cbc48f41a6297f5a7c8449720b3f51ae02ac67ddefbef68ab0a777bb6425adaddefc567203c
7
- data.tar.gz: a8bf7e23038a9059ad52faa347e58f6a5a4e50d228699a98a90a3d51c1f88c2be108d29e22370745e3a43979b6528af73dbf8dcdce580d5b042cd5d181a3d980
6
+ metadata.gz: 07d90a197672dff508c6db4e6e40d90da18ce06dad381b0f46c6687d3e1d38eb2f36a305d06717863da6f5f60fcb4cdae25d9df4a5862a54338887a819255924
7
+ data.tar.gz: cd758d32d70d00af78c8f9580ff66a02f7c4bc86de7dde7306f90eaa7f4ae88dd431987adbfd984bd7e8f12b5d9f648c66c89729443b47a87204f61b4fc65dd2
data/README.md CHANGED
@@ -47,6 +47,17 @@ export OPENSEARCH_PASSWORD=your_password
47
47
  rake -T | grep " schema:"
48
48
  ```
49
49
 
50
+ Available schema tasks:
51
+ - `schema:migrate[alias_name]` - Migrate to a specific alias schema or migrate all schemas
52
+ - `schema:new` - Create a new alias with sample schema
53
+ - `schema:close[name]` - Close an index or alias
54
+ - `schema:delete[name]` - Hard delete an index (only works on closed indexes) or delete an alias
55
+ - `schema:drop[alias_name]` - Delete an alias (does not delete the index)
56
+ - `schema:download` - Download schema from an existing alias or index
57
+ - `schema:alias` - Create an alias for an existing index
58
+ - `schema:seed` - Seed data to a live index
59
+ - `schema:diff` - Compare all schemas to their corresponding downloaded alias settings and mappings
60
+
50
61
  ### Download an existing schema
51
62
 
52
63
  Run `rake schema:download` to download a schema from an existing alias or index:
@@ -136,114 +147,67 @@ Use `INTERACTIVE` to prompt to proceed before applying any POST/PUT/DELETE opera
136
147
  INTERACTIVE=true rake schema:migrate
137
148
  ```
138
149
 
150
+ Use `REINDEX_BATCH_SIZE` to control the batch size for reindexing operations (default: 1000):
139
151
 
140
- ## How migrations work
152
+ ```
153
+ REINDEX_BATCH_SIZE=500 rake schema:migrate
154
+ ```
141
155
 
142
- When possible, `rake schema:migrate` will update settings and mappings in-place on an aliased index, without reindexing. Only breaking changes require a reindex.
156
+ Use `REINDEX_REQUESTS_PER_SECOND` to throttle reindexing operations (default: -1, no throttling):
143
157
 
144
- Migrating breaking changes requires careful orchestration of reads and writes to ensure documents that are created/updated/deleted during the migration are not lost.
158
+ ```
159
+ REINDEX_REQUESTS_PER_SECOND=100 rake schema:migrate
160
+ ```
145
161
 
146
- Use case:
147
- - I have an alias `products` pointing at index `products-20250301000000`.
148
- - I have heavy reads and writes with 100M+ documents in the index
149
- - I want to reindex `products-20250301000000` into a new index and update the `products` alias to reference it, without losing any creates/updates/deletes during the process.
150
162
 
151
- Rake `schema:migrate` solves this use case through the following procedure.
163
+ ## Client responsibilities during breaking migrations
152
164
 
153
- First, some terms:
154
- - `alias_name`: Alias containing the index to migrate
155
- - `products`
156
- - `current_index`: First and only index in the alias
157
- - `products-20250301000000`
158
- - `new_index`: Final canonical index into which to migrate `current_index`
159
- - `products-20250601000000`
160
- - `catchup1_index`: Temp index to preserve writes during reindex
161
- - `products-20250601000000-catchup-1`
162
- - `catchup2_index`: Temp index to preserve writes while flushing `catchup1_index`
163
- - `products-20250601000000-catchup-2`
164
- - `log_index`: Index to log the migration state, not stored with `alias_name`
165
- - `products-20250601000000-migration-log`
165
+ #### Clients MUST retry failed creates/updates/deletes for up to ~ 1 minute.
166
166
 
167
- SETUP
167
+ Writes will be temporarily disabled for a few seconds during the procedure to prevent data loss.
168
168
 
169
- Create `log_index` to log the migration state.
170
- - The migration logs when it starts and completes a step along with a description.
169
+ #### Clients MUST read and write to an **alias**. Clients must NOT write directly to an **index**.
171
170
 
172
- STEP 1
171
+ To prevent downtime, the migration procedure only operates on aliased indexes.
173
172
 
174
- Create `catchup1_index` using the new schema.
175
- - This index will preserve writes during the reindex.
173
+ Run `rake schema:alias` to create a new alias pointed at an index.
176
174
 
177
- STEP 2
175
+ #### Hard-deletes during reindexing will NOT affect the migrated index.
178
176
 
179
- Configure `alias_name` to only write to `catchup1_index` and read from `current_index` and `catchup1_index`.
177
+ Clients can mitigate the lack of hard-delete support two ways:
180
178
 
181
- STEP 3
179
+ 1. (Recommended) Implement soft-deletes (e.g. set `deleted_at`) with a recurring hard-delete job. Run the hard-delete job after reindexing.
182
180
 
183
- Create `new_index` using the new schema.
181
+ 2. Use RBAC to deny all `DELETE` operations during reindexing and implement continuous retries on failed `DELETE` operations to ensure eventual consistency.
184
182
 
185
- Reindex `current_index` into `new_index`.
183
+ #### During reindexing, searches will return **duplicate results** for updated documents.
186
184
 
187
- ```
188
- POST _reindex
189
- {
190
- "source": { "index": "#{current_index}" },
191
- "dest": { "index": "#{new_index}" },
192
- "conflicts": "proceed",
193
- "refresh": false
194
- }
195
- ```
185
+ After reindexing, only the latest update will appear in search results.
196
186
 
197
- STEP 4
187
+ Clients can mitigate seeing duplicate documents in two ways:
198
188
 
199
- Create `catchup2_index` using the new schema.
200
- - This index ensures a place for ongoing writes while flushing `catchup1_index`.
189
+ 1. (Recommended) Clients may hide duplicate documents by implementing `collapse` on all searches. `collapse` incurs a small performance cost to each query. Clients may choose to `collapse` only when the alias is configured to read from multiple indices. For a reference implementation of conditionally de-duping using a `collapse` query while reindexing, see: https://github.com/richkuz/schema-tools-sample-app/blob/fc60718f5784e52d55b0c009e863f8b1c8303662/demo_script.rb#L255
201
190
 
202
- STEP 5
191
+ 2. Use RBAC to deny all `UPDATE` operations during reindexing and implement continuous retries on failed `UPDATE` operations to ensure eventual consistency. This approach is suitable only for clients that can tolerate not seeing documents updated during reindexing.
203
192
 
204
- Configure `alias_name` to only write to `catchup2_index` and continue reading from `current_index` and `catchup1_index`.
193
+ Why there are duplicate updated documents during reindexing:
194
+ - The migration task configures an alias to read from both the original index and a catchup index, and write to the catchup index.
195
+ - `UPDATE` operations produce an additional document in the catchup index.
196
+ - When clients `_search` the alias for an updated document, they will see two results: one result from the original index, and one result from the catchup index.
205
197
 
206
- STEP 6
207
198
 
208
- Reindex `catchup1_index` into `new_index`.
209
- - Merge the first catchup index into the new canonical index.
199
+ #### Theoretical Alternatives for UPDATE and DELETE
210
200
 
211
- STEP 7
212
-
213
- Configure `alias_name` so there are NO write indexes
214
- - This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
215
- - Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
216
- - Clients must retry any failed writes.
201
+ In theory, the migrate task could support alternative reindexing modes when constrainted by native Elasticsearch/OpenSearch capabilities.
217
202
 
218
- STEP 8
203
+ 1. Preserve Hard-Deletes and Show All Duplicates
219
204
 
220
- Reindex `catchup2_index` into `new_index`
221
- - Final sync to merge the second catchup index into the new canonical index.
205
+ The migrate task could support clients that require hard-deletes during reindexing by adding the new index into the alias during migration. Clients would have to use `_refresh` and `delete_by_query` when deleting documents to ensure documents are deleted from all indexes in the alias during reindexing. If using `DELETE` to delete a single document from an alias, clients might delete from the wrong index and receive a successful response containing "result: not_found". The new index would _not_ reflect such a deletion. With this approach, clients would see duplicate documents in search results for all documents during reindexing, not just updated documents. Clients could hide duplicate documents by implementing `collapse` on all searches.
222
206
 
223
- STEP 9
207
+ 2. Ignore Hard-Deletes and Hide All Duplicates
224
208
 
225
- Configure `alias_name` to write to and read from `new_index` only.
226
- - Writes resume to the single new index. All data and deletes are consistent.
209
+ Some clients might not be able to filter out duplicate documents during reindexing. The migrate task could support such clients by not returning any INSERTED or UPDATED documents until after the reindexing completes. This approach would not support hard-deletes. To support re-updating the same document during reindexing, clients would have to find documents to upsert based on a consistent ID, not based on a changing field.
227
210
 
228
- STEP 10
229
-
230
- Close unused indexes to avoid accidental writes.
231
- - Close `catchup1_index`
232
- - Close `catchup2_index`
233
- - Close `current_index`
234
- Operation complete.
235
-
236
- Users can safely delete closed indexes anytime after they are closed.
237
-
238
- Caveats for clients that perform writes during the migration:
239
- - Clients MUST retry failed creates/updates/deletes for up to a minute.
240
- - Writes will be temporarily disabled for up to a few seconds during the procedure to ensure no data loss.
241
- - Clients MUST use `delete_by_query` when deleting documents to ensure documents are deleted from all indexes in the alias during reindexing.
242
- - If using `DELETE` to delete a single document from an alias, clients might delete from the wrong index and receive a successful response containing "result: not_found". The new index will _not_ reflect such a deletion.
243
- - Clients MUST read and write to an alias, not directly to an index.
244
- - To prevent downtime, the migration procedure only operates on aliased indexes.
245
- - Run `rake schema:alias` to create a new alias pointed at an index.
246
- - Client applications must read and write to alias_name instead of index_name.
247
211
 
248
212
  ### Diagnosing a failed or aborted migration
249
213
 
@@ -310,8 +274,115 @@ Run `rake 'schema:close[alias_name]'` to close all indexes in an alias.
310
274
 
311
275
  Run `rake 'schema:delete[alias_name]'` to delete an alias and leave its indexes untouched.
312
276
 
277
+ Run `rake 'schema:drop[alias_name]'` to delete an alias (does not delete the underlying index).
278
+
313
279
  GitHub Actions:
314
280
  - OpenSearch Staging Close Index
315
281
  - OpenSearch Production Close Index
316
282
  - OpenSearch Staging Delete Index
317
283
  - OpenSearch Production Delete Index
284
+
285
+
286
+ ## How migrations work
287
+
288
+ When possible, `rake schema:migrate` will update settings and mappings in-place on an aliased index, without reindexing. Only breaking changes require a reindex.
289
+
290
+ Migrating breaking changes requires careful orchestration of reads and writes to ensure documents that are created/updated during the migration are not lost.
291
+
292
+ Hard-delete operations are not preserved during a breaking migration. See "Client responsibilities" above for how to mitigate this.
293
+
294
+ Use case:
295
+ - I have an alias `products` pointing at index `products-20250301000000`.
296
+ - I have heavy reads and writes with 100M+ documents in the index
297
+ - I want to reindex `products-20250301000000` into a new index and update the `products` alias to reference it, without losing any creates/updates during the process.
298
+
299
+ Rake `schema:migrate` solves this use case through the following procedure.
300
+
301
+ First, some terms:
302
+ - `alias_name`: Alias containing the index to migrate
303
+ - `products`
304
+ - `current_index`: First and only index in the alias
305
+ - `products-20250301000000`
306
+ - `new_index`: Final canonical index into which to migrate `current_index`
307
+ - `products-20250601000000`
308
+ - `catchup1_index`: Temp index to preserve writes during reindex
309
+ - `products-20250601000000-catchup-1`
310
+ - `catchup2_index`: Temp index to preserve writes while flushing `catchup1_index`
311
+ - `products-20250601000000-catchup-2`
312
+ - `log_index`: Index to log the migration state, not stored with `alias_name`
313
+ - `products-20250601000000-migration-log`
314
+
315
+ SETUP
316
+
317
+ Create `log_index` to log the migration state.
318
+ - The migration logs when it starts and completes a step along with a description.
319
+
320
+ STEP 1
321
+
322
+ Attempt to reindex 1 document to a throwaway index to catch obvious configuration errors and abort early if possible.
323
+
324
+ STEP 2
325
+
326
+ Create `catchup1_index` using the new schema.
327
+ - This index will preserve writes during the reindex.
328
+
329
+ STEP 3
330
+
331
+ Configure `alias_name` to only write to `catchup1_index` and read from `current_index` and `catchup1_index`.
332
+
333
+ STEP 4
334
+
335
+ Create `new_index` using the new schema.
336
+
337
+ Reindex `current_index` into `new_index`.
338
+
339
+ ```
340
+ POST _reindex
341
+ {
342
+ "source": { "index": "#{current_index}" },
343
+ "dest": { "index": "#{new_index}" },
344
+ "conflicts": "proceed",
345
+ "refresh": false
346
+ }
347
+ ```
348
+
349
+ STEP 5
350
+
351
+ Create `catchup2_index` using the new schema.
352
+ - This index ensures a place for ongoing writes while flushing `catchup1_index`.
353
+
354
+ STEP 6
355
+
356
+ Configure `alias_name` to only write to `catchup2_index` and continue reading from `current_index` and `catchup1_index`.
357
+
358
+ STEP 7
359
+
360
+ Reindex `catchup1_index` into `new_index`.
361
+ - Merge the first catchup index into the new canonical index.
362
+
363
+ STEP 8
364
+
365
+ Configure `alias_name` so there are NO write indexes
366
+ - This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
367
+ - Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
368
+ - Clients must retry any failed writes.
369
+
370
+ STEP 9
371
+
372
+ Reindex `catchup2_index` into `new_index`
373
+ - Final sync to merge the second catchup index into the new canonical index.
374
+
375
+ STEP 10
376
+
377
+ Configure `alias_name` to write to and read from `new_index` only.
378
+ - Writes resume to the single new index. All data and deletes are consistent.
379
+
380
+ STEP 11
381
+
382
+ Close unused indexes to avoid accidental writes.
383
+ - Close `catchup1_index`
384
+ - Close `catchup2_index`
385
+ - Close `current_index`
386
+ Operation complete.
387
+
388
+ Users can safely delete closed indexes anytime after they are closed.
@@ -25,9 +25,10 @@ module SchemaTools
25
25
  class Client
26
26
  attr_reader :url
27
27
 
28
- def initialize(url, dryrun: false, logger: SimpleLogger.new, username: nil, password: nil)
28
+ def initialize(url, dryrun: false, interactive: false, logger: SimpleLogger.new, username: nil, password: nil)
29
29
  @url = url
30
30
  @dryrun = dryrun
31
+ @interactive = interactive
31
32
  @logger = logger
32
33
  @username = username
33
34
  @password = password
@@ -162,19 +163,59 @@ module SchemaTools
162
163
  end
163
164
 
164
165
 
165
- def reindex(source_index, dest_index, script = nil)
166
+ def reindex(source_index:, dest_index:, script: nil, size: 1000, requests_per_second: -1)
166
167
  body = {
167
- source: { index: source_index },
168
+ source: {
169
+ index: source_index,
170
+ size: size
171
+ },
168
172
  dest: { index: dest_index },
169
173
  conflicts: "proceed"
170
174
  }
171
- body[:script] = { source: script } if script
175
+ body[:script] = { lang: 'painless', source: script } if script
172
176
 
173
177
  url = "/_reindex?wait_for_completion=false&refresh=false"
178
+ url += "&requests_per_second=#{requests_per_second}" if requests_per_second != -1
174
179
 
175
180
  post(url, body)
176
181
  end
177
182
 
183
+ def reindex_one_doc(source_index:, dest_index:, script: nil)
184
+ body = {
185
+ source: {
186
+ index: source_index,
187
+ query: { match_all: {} }
188
+ },
189
+ max_docs: 1,
190
+ dest: { index: dest_index },
191
+ conflicts: "proceed"
192
+ }
193
+ body[:script] = { lang: 'painless', source: script } if script
194
+
195
+ url = "/_reindex?wait_for_completion=true&refresh=true"
196
+
197
+ response = post(url, body)
198
+
199
+ if response['failures'] && !response['failures'].empty?
200
+ failure_reason = response['failures'].map { |f| f['cause']['reason'] }.join("; ")
201
+ raise "Reindex failed with internal errors. Failures: #{failure_reason}"
202
+ end
203
+
204
+ total = response['total'].to_i
205
+ created = response['created'].to_i
206
+ updated = response['updated'].to_i
207
+
208
+ if total != 1
209
+ raise "Reindex query found #{total} documents. Expected to find 1."
210
+ elsif created + updated != 1
211
+ raise "Reindex failed to index the document (created: #{created}, updated: #{updated}). Noops: #{response.fetch('noops', 0)}."
212
+ elsif response['timed_out'] == true
213
+ raise "Reindex operation timed out."
214
+ end
215
+
216
+ response
217
+ end
218
+
178
219
  def get_task_status(task_id)
179
220
  get("/_tasks/#{task_id}")
180
221
  end
@@ -372,6 +413,10 @@ module SchemaTools
372
413
  task_status = get_task_status(task_id)
373
414
 
374
415
  if task_status['completed']
416
+ if task_status['error']
417
+ log "ERROR: Task #{task_id} failed with a top-level error: #{task_status['error']}"
418
+ raise task_status['error']['reason']
419
+ end
375
420
  return task_status
376
421
  end
377
422
 
@@ -435,6 +480,17 @@ module SchemaTools
435
480
  settings.dig('index', 'verified_before_close') == 'true'
436
481
  end
437
482
 
483
+ def refresh(index_name, suppress_logging: false)
484
+ post("/#{index_name}/_refresh", {}, suppress_logging: suppress_logging)
485
+ end
486
+
487
+ # For this to work reliably, segments MUST be flushed.
488
+ # Call refresh(index_name) first!
489
+ def delete_by_query(index_name, query, suppress_logging: false)
490
+ body = { query: query }
491
+ post("/#{index_name}/_delete_by_query", body, suppress_logging: suppress_logging)
492
+ end
493
+
438
494
  private
439
495
 
440
496
  def make_http_request(uri)
@@ -461,7 +517,7 @@ module SchemaTools
461
517
  end
462
518
 
463
519
  def interactive_mode?
464
- ENV['INTERACTIVE'] == 'true'
520
+ @interactive
465
521
  end
466
522
 
467
523
  def await_user_input
@@ -8,7 +8,7 @@ require_relative '../api_aware_mappings_diff'
8
8
  require 'json'
9
9
 
10
10
  module SchemaTools
11
- def self.migrate_all(client:)
11
+ def self.migrate_all(client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
12
12
  puts "Discovering all schemas and migrating each to their latest revisions..."
13
13
 
14
14
  schemas = SchemaFiles.discover_all_schemas
@@ -26,7 +26,7 @@ module SchemaTools
26
26
 
27
27
  schemas.each do |alias_name|
28
28
  begin
29
- migrate_one_schema(alias_name: alias_name, client: client)
29
+ migrate_one_schema(alias_name: alias_name, client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
30
30
  rescue => e
31
31
  puts "✗ Migration failed for #{alias_name}: #{e.message}"
32
32
  raise e
@@ -35,7 +35,7 @@ module SchemaTools
35
35
  end
36
36
  end
37
37
 
38
- def self.migrate_one_schema(alias_name:, client:)
38
+ def self.migrate_one_schema(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
39
39
  puts "=" * 60
40
40
  puts "Migrating alias #{alias_name}"
41
41
  puts "=" * 60
@@ -84,7 +84,7 @@ module SchemaTools
84
84
  puts "✗ Failed to update index '#{index_name}': #{e.message}"
85
85
  puts "This appears to be a breaking change. Starting breaking change migration..."
86
86
 
87
- MigrateBreakingChange.migrate(alias_name:, client:)
87
+ MigrateBreakingChange.migrate(alias_name:, client:, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
88
88
  end
89
89
  end
90
90
  end
@@ -43,13 +43,15 @@ module SchemaTools
43
43
  end
44
44
 
45
45
  class MigrateBreakingChange
46
- def self.migrate(alias_name:, client:)
47
- new(alias_name: alias_name, client: client).migrate
46
+ def self.migrate(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
47
+ new(alias_name: alias_name, client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second).migrate
48
48
  end
49
49
 
50
- def initialize(alias_name:, client:)
50
+ def initialize(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
51
51
  @alias_name = alias_name
52
52
  @client = client
53
+ @reindex_batch_size = reindex_batch_size
54
+ @reindex_requests_per_second = reindex_requests_per_second
53
55
  @migration_log_index = nil
54
56
  @current_step = nil
55
57
  @rollback_attempted = false
@@ -107,6 +109,9 @@ module SchemaTools
107
109
  @catchup2_index = "#{@new_index}-catchup-2"
108
110
  log "catchup2_index: #{@catchup2_index}"
109
111
 
112
+ @throwaway_test_index = "#{@new_index}-throwaway-test"
113
+ log "throwaway_test_index: #{@throwaway_test_index}}"
114
+
110
115
  # Use current index settings and mappings when creating catchup indexes
111
116
  # so that any reindex painless script logic will apply correctly to them.
112
117
  @current_settings = @client.get_index_settings(@current_index)
@@ -136,6 +141,10 @@ module SchemaTools
136
141
 
137
142
  def migration_steps
138
143
  [
144
+ MigrationStep.new(
145
+ name: "STEP 0: Pre-test reindex with 1 document",
146
+ run: ->(logger) { step0_test_reindex_one_doc }
147
+ ),
139
148
  MigrationStep.new(
140
149
  name: "STEP 1: Create catchup-1 index",
141
150
  run: ->(logger) { step1_create_catchup1 }
@@ -179,6 +188,19 @@ module SchemaTools
179
188
  ]
180
189
  end
181
190
 
191
+ def step0_test_reindex_one_doc
192
+ @client.create_index(@throwaway_test_index, @new_settings, @new_mappings)
193
+ begin
194
+ @client.reindex_one_doc(source_index: @current_index, dest_index: @throwaway_test_index, script: @reindex_script)
195
+ rescue => e
196
+ log "Failed reindexing a test document"
197
+ raise e
198
+ ensure
199
+ log "Deleting throwaway test index #{@throwaway_test_index}"
200
+ @client.delete_index(@throwaway_test_index)
201
+ end
202
+ end
203
+
182
204
  def step1_create_catchup1
183
205
  @client.create_index(@catchup1_index, @current_settings, @current_mappings)
184
206
  log "Created catchup-1 index: #{@catchup1_index}"
@@ -226,24 +248,38 @@ module SchemaTools
226
248
  end
227
249
 
228
250
  def reindex(current_index, new_index, reindex_script)
229
- response = @client.reindex(current_index, new_index, reindex_script)
230
- log response
231
-
232
- if response['took']
233
- log "Reindex task complete. Took: #{response['took']}"
251
+ task_response = @client.reindex(source_index: current_index, dest_index: new_index, script: reindex_script, size: @reindex_batch_size, requests_per_second: @reindex_requests_per_second)
252
+ log task_response
253
+ if task_response['took']
254
+ log "Reindex task complete. Took: #{task_response['took']}"
255
+ if task_response['failures'] && !task_response['failures'].empty?
256
+ failure_reason = task_response['failures'].map { |f| f['cause']['reason'] }.join("; ")
257
+ raise "Reindex failed synchronously with internal errors. Failures: #{failure_reason}"
258
+ end
234
259
  return true
235
260
  end
236
-
237
- task_id = response['task']
238
- if !task_id
239
- raise "No task ID from reindex. Reindex incomplete."
261
+ task_id = task_response['task']
262
+ unless task_id
263
+ raise "Reindex response did not contain 'task' ID or 'took' time. Reindex incomplete."
240
264
  end
241
-
242
265
  log "Reindex task started at #{Time.now}. task_id is #{task_id}. Fetch task status with GET #{@client.url}/_tasks/#{task_id}"
243
266
 
244
267
  timeout = 604800 # 1 week
245
- @client.wait_for_task(response['task'], timeout)
246
- log "Reindex complete"
268
+ completed_task_status = @client.wait_for_task(task_response['task'], timeout)
269
+ final_result = completed_task_status.fetch('response', {})
270
+ if final_result['failures'] && !final_result['failures'].empty?
271
+ failure_reason = final_result['failures'].map { |f| f['cause']['reason'] }.join("; ")
272
+ raise "Reindex FAILED during async processing. Failures: #{failure_reason}"
273
+ end
274
+ created = final_result.fetch('created', 0)
275
+ updated = final_result.fetch('updated', 0)
276
+ deleted = final_result.fetch('deleted', 0)
277
+ log "Reindex complete." + \
278
+ "\nTook: #{final_result['took']}ms." + \
279
+ "\nCreated: #{created}" + \
280
+ "\nUpdated: #{updated}" + \
281
+ "\nDeleted: #{deleted}"
282
+ return true
247
283
  end
248
284
 
249
285
  def step4_create_catchup2
@@ -105,7 +105,7 @@ module SchemaTools
105
105
  @logger.log "📊 Found #{doc_count} documents in catchup-1 index - reindexing to original..."
106
106
 
107
107
  # Reindex from catchup-1 to original index
108
- response = @client.reindex(@catchup1_index, @current_index, nil)
108
+ response = @client.reindex(source_index: @catchup1_index, dest_index: @current_index, script: nil)
109
109
  @logger.log "Reindex task started - task_id: #{response['task']}"
110
110
 
111
111
  # Wait for reindex to complete
@@ -26,6 +26,9 @@ module SchemaTools
26
26
  sample_settings = {
27
27
  "number_of_shards" => 1,
28
28
  "number_of_replicas" => 0,
29
+ "replication": {
30
+ "type": "DOCUMENT"
31
+ },
29
32
  "analysis" => {
30
33
  "analyzer" => {
31
34
  "default" => {
@@ -59,13 +62,33 @@ module SchemaTools
59
62
 
60
63
  settings_file = File.join(schema_path, 'settings.json')
61
64
  mappings_file = File.join(schema_path, 'mappings.json')
65
+ reindex_file = File.join(schema_path, 'reindex.painless')
62
66
 
63
67
  File.write(settings_file, JSON.pretty_generate(sample_settings))
64
68
  File.write(mappings_file, JSON.pretty_generate(sample_mappings))
65
69
 
70
+ # Create example reindex.painless file
71
+ reindex_content = <<~PAINLESS
72
+ // Example reindex script for transforming data during migration
73
+ // Modify this script to transform your data as needed
74
+ //
75
+ // Example: Rename a field
76
+ // if (ctx._source.containsKey('old_field_name')) {
77
+ // ctx._source.new_field_name = ctx._source.old_field_name;
78
+ // ctx._source.remove('old_field_name');
79
+ // }
80
+ //
81
+ // Example: Add a new field
82
+ // ctx._source.new_field = 'default_value';
83
+ long timestamp = System.currentTimeMillis();
84
+ PAINLESS
85
+
86
+ File.write(reindex_file, reindex_content)
87
+
66
88
  puts "✓ Sample schema created at #{schema_path}"
67
89
  puts " - settings.json"
68
90
  puts " - mappings.json"
91
+ puts " - reindex.painless"
69
92
  end
70
93
 
71
94
  def self.create_alias_for_index(client:)
@@ -142,13 +165,32 @@ module SchemaTools
142
165
 
143
166
  settings_file = File.join(schema_path, 'settings.json')
144
167
  mappings_file = File.join(schema_path, 'mappings.json')
168
+ reindex_file = File.join(schema_path, 'reindex.painless')
145
169
 
146
170
  File.write(settings_file, JSON.pretty_generate(filtered_settings))
147
171
  File.write(mappings_file, JSON.pretty_generate(mappings))
148
172
 
173
+ # Create example reindex.painless file
174
+ reindex_content = <<~PAINLESS
175
+ # Example reindex script for transforming data during migration
176
+ # Modify this script to transform your data as needed
177
+ #
178
+ # Example: Rename a field
179
+ # if (ctx._source.containsKey('old_field_name')) {
180
+ # ctx._source.new_field_name = ctx._source.old_field_name;
181
+ # ctx._source.remove('old_field_name');
182
+ # }
183
+ #
184
+ # Example: Add a new field
185
+ # ctx._source.new_field = 'default_value';
186
+ PAINLESS
187
+
188
+ File.write(reindex_file, reindex_content)
189
+
149
190
  puts "✓ Schema downloaded to #{schema_path}"
150
191
  puts " - settings.json"
151
192
  puts " - mappings.json"
193
+ puts " - reindex.painless"
152
194
  end
153
195
 
154
196
  end
@@ -32,6 +32,7 @@ def create_client!
32
32
  client = SchemaTools::Client.new(
33
33
  SchemaTools::Config.connection_url,
34
34
  dryrun: ENV['DRYRUN'] == 'true',
35
+ interactive: ENV['INTERACTIVE'] == 'true',
35
36
  username: SchemaTools::Config.connection_username,
36
37
  password: SchemaTools::Config.connection_password
37
38
  )
@@ -51,11 +52,14 @@ namespace :schema do
51
52
  desc "Migrate to a specific alias schema or migrate all schemas to their latest revisions"
52
53
  task :migrate, [:alias_name] do |t, args|
53
54
  client = create_client!
54
-
55
+
56
+ reindex_batch_size = ENV['REINDEX_BATCH_SIZE'] ? ENV['REINDEX_BATCH_SIZE'].to_i : 1000
57
+ reindex_requests_per_second = ENV['REINDEX_REQUESTS_PER_SECOND'] ? ENV['REINDEX_REQUESTS_PER_SECOND'].to_i : -1
58
+
55
59
  if args[:alias_name]
56
- SchemaTools.migrate_one_schema(alias_name: args[:alias_name], client: client)
60
+ SchemaTools.migrate_one_schema(alias_name: args[:alias_name], client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
57
61
  else
58
- SchemaTools.migrate_all(client: client)
62
+ SchemaTools.migrate_all(client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
59
63
  end
60
64
  end
61
65
 
@@ -91,6 +95,30 @@ namespace :schema do
91
95
  )
92
96
  end
93
97
 
98
+ desc "Delete an alias (does not delete the index)"
99
+ task :drop, [:alias_name] do |t, args|
100
+ client = create_client!
101
+
102
+ unless args[:alias_name]
103
+ puts "Error: alias_name is required"
104
+ puts "Usage: rake 'schema:drop[alias_name]'"
105
+ exit 1
106
+ end
107
+
108
+ alias_name = args[:alias_name]
109
+
110
+ unless client.alias_exists?(alias_name)
111
+ puts "Error: Alias '#{alias_name}' does not exist"
112
+ exit 1
113
+ end
114
+
115
+ indices = client.get_alias_indices(alias_name)
116
+ puts "Deleting alias '#{alias_name}' from indices: #{indices.join(', ')}"
117
+
118
+ client.delete_alias(alias_name)
119
+ puts "✓ Alias '#{alias_name}' deleted successfully"
120
+ end
121
+
94
122
  desc "Download schema from an existing alias or index"
95
123
  task :download do |t, args|
96
124
  client = create_client!
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: schema-tools
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.5
4
+ version: 1.0.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rich Kuzsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-10-11 00:00:00.000000000 Z
11
+ date: 2025-10-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -138,7 +138,6 @@ files:
138
138
  - bin/setup
139
139
  - lib/schema_tools.rb
140
140
  - lib/schema_tools/api_aware_mappings_diff.rb
141
- - lib/schema_tools/catchup.rb
142
141
  - lib/schema_tools/client.rb
143
142
  - lib/schema_tools/close.rb
144
143
  - lib/schema_tools/config.rb
@@ -1,23 +0,0 @@
1
- module SchemaTools
2
- def self.catchup(index_name:, client:)
3
- raise "index_name parameter is required" unless index_name
4
-
5
- index_config = SchemaFiles.get_index_config(index_name)
6
- raise "Index configuration not found for #{index_name}" unless index_config
7
-
8
- from_index = index_config['from_index_name']
9
- raise "from_index_name not specified in index configuration" unless from_index
10
-
11
- unless client.index_exists?(from_index)
12
- raise "Source index #{from_index} does not exist. Cannot perform catchup reindex to #{index_name}."
13
- end
14
-
15
- reindex_script = SchemaFiles.get_reindex_script(index_name)
16
-
17
- puts "Starting catchup reindex from #{from_index} to #{index_name}"
18
- # TODO NOT IMPLEMENTED YET
19
- # Do a reindex by query
20
- puts "TODO IMPLEMENT ME"
21
- response = client.reindex(from_index, index_name, reindex_script)
22
- end
23
- end