schema-tools 1.0.5 → 1.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +151 -80
- data/lib/schema_tools/client.rb +61 -5
- data/lib/schema_tools/migrate/migrate.rb +4 -4
- data/lib/schema_tools/migrate/migrate_breaking_change.rb +51 -15
- data/lib/schema_tools/migrate/rollback.rb +1 -1
- data/lib/schema_tools/new_alias.rb +42 -0
- data/lib/tasks/schema.rake +31 -3
- metadata +2 -3
- data/lib/schema_tools/catchup.rb +0 -23
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3a977e1d5ae35640094087dff8eb1b1e5481db2b6bc0d35315f219fea02210d6
|
4
|
+
data.tar.gz: d8a40fbc727ede8614d607a22ca9aa830b32254598f8900d8f0e0b57fc315589
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 07d90a197672dff508c6db4e6e40d90da18ce06dad381b0f46c6687d3e1d38eb2f36a305d06717863da6f5f60fcb4cdae25d9df4a5862a54338887a819255924
|
7
|
+
data.tar.gz: cd758d32d70d00af78c8f9580ff66a02f7c4bc86de7dde7306f90eaa7f4ae88dd431987adbfd984bd7e8f12b5d9f648c66c89729443b47a87204f61b4fc65dd2
|
data/README.md
CHANGED
@@ -47,6 +47,17 @@ export OPENSEARCH_PASSWORD=your_password
|
|
47
47
|
rake -T | grep " schema:"
|
48
48
|
```
|
49
49
|
|
50
|
+
Available schema tasks:
|
51
|
+
- `schema:migrate[alias_name]` - Migrate to a specific alias schema or migrate all schemas
|
52
|
+
- `schema:new` - Create a new alias with sample schema
|
53
|
+
- `schema:close[name]` - Close an index or alias
|
54
|
+
- `schema:delete[name]` - Hard delete an index (only works on closed indexes) or delete an alias
|
55
|
+
- `schema:drop[alias_name]` - Delete an alias (does not delete the index)
|
56
|
+
- `schema:download` - Download schema from an existing alias or index
|
57
|
+
- `schema:alias` - Create an alias for an existing index
|
58
|
+
- `schema:seed` - Seed data to a live index
|
59
|
+
- `schema:diff` - Compare all schemas to their corresponding downloaded alias settings and mappings
|
60
|
+
|
50
61
|
### Download an existing schema
|
51
62
|
|
52
63
|
Run `rake schema:download` to download a schema from an existing alias or index:
|
@@ -136,114 +147,67 @@ Use `INTERACTIVE` to prompt to proceed before applying any POST/PUT/DELETE opera
|
|
136
147
|
INTERACTIVE=true rake schema:migrate
|
137
148
|
```
|
138
149
|
|
150
|
+
Use `REINDEX_BATCH_SIZE` to control the batch size for reindexing operations (default: 1000):
|
139
151
|
|
140
|
-
|
152
|
+
```
|
153
|
+
REINDEX_BATCH_SIZE=500 rake schema:migrate
|
154
|
+
```
|
141
155
|
|
142
|
-
|
156
|
+
Use `REINDEX_REQUESTS_PER_SECOND` to throttle reindexing operations (default: -1, no throttling):
|
143
157
|
|
144
|
-
|
158
|
+
```
|
159
|
+
REINDEX_REQUESTS_PER_SECOND=100 rake schema:migrate
|
160
|
+
```
|
145
161
|
|
146
|
-
Use case:
|
147
|
-
- I have an alias `products` pointing at index `products-20250301000000`.
|
148
|
-
- I have heavy reads and writes with 100M+ documents in the index
|
149
|
-
- I want to reindex `products-20250301000000` into a new index and update the `products` alias to reference it, without losing any creates/updates/deletes during the process.
|
150
162
|
|
151
|
-
|
163
|
+
## Client responsibilities during breaking migrations
|
152
164
|
|
153
|
-
|
154
|
-
- `alias_name`: Alias containing the index to migrate
|
155
|
-
- `products`
|
156
|
-
- `current_index`: First and only index in the alias
|
157
|
-
- `products-20250301000000`
|
158
|
-
- `new_index`: Final canonical index into which to migrate `current_index`
|
159
|
-
- `products-20250601000000`
|
160
|
-
- `catchup1_index`: Temp index to preserve writes during reindex
|
161
|
-
- `products-20250601000000-catchup-1`
|
162
|
-
- `catchup2_index`: Temp index to preserve writes while flushing `catchup1_index`
|
163
|
-
- `products-20250601000000-catchup-2`
|
164
|
-
- `log_index`: Index to log the migration state, not stored with `alias_name`
|
165
|
-
- `products-20250601000000-migration-log`
|
165
|
+
#### Clients MUST retry failed creates/updates/deletes for up to ~ 1 minute.
|
166
166
|
|
167
|
-
|
167
|
+
Writes will be temporarily disabled for a few seconds during the procedure to prevent data loss.
|
168
168
|
|
169
|
-
|
170
|
-
- The migration logs when it starts and completes a step along with a description.
|
169
|
+
#### Clients MUST read and write to an **alias**. Clients must NOT write directly to an **index**.
|
171
170
|
|
172
|
-
|
171
|
+
To prevent downtime, the migration procedure only operates on aliased indexes.
|
173
172
|
|
174
|
-
|
175
|
-
- This index will preserve writes during the reindex.
|
173
|
+
Run `rake schema:alias` to create a new alias pointed at an index.
|
176
174
|
|
177
|
-
|
175
|
+
#### Hard-deletes during reindexing will NOT affect the migrated index.
|
178
176
|
|
179
|
-
|
177
|
+
Clients can mitigate the lack of hard-delete support two ways:
|
180
178
|
|
181
|
-
|
179
|
+
1. (Recommended) Implement soft-deletes (e.g. set `deleted_at`) with a recurring hard-delete job. Run the hard-delete job after reindexing.
|
182
180
|
|
183
|
-
|
181
|
+
2. Use RBAC to deny all `DELETE` operations during reindexing and implement continuous retries on failed `DELETE` operations to ensure eventual consistency.
|
184
182
|
|
185
|
-
|
183
|
+
#### During reindexing, searches will return **duplicate results** for updated documents.
|
186
184
|
|
187
|
-
|
188
|
-
POST _reindex
|
189
|
-
{
|
190
|
-
"source": { "index": "#{current_index}" },
|
191
|
-
"dest": { "index": "#{new_index}" },
|
192
|
-
"conflicts": "proceed",
|
193
|
-
"refresh": false
|
194
|
-
}
|
195
|
-
```
|
185
|
+
After reindexing, only the latest update will appear in search results.
|
196
186
|
|
197
|
-
|
187
|
+
Clients can mitigate seeing duplicate documents in two ways:
|
198
188
|
|
199
|
-
|
200
|
-
- This index ensures a place for ongoing writes while flushing `catchup1_index`.
|
189
|
+
1. (Recommended) Clients may hide duplicate documents by implementing `collapse` on all searches. `collapse` incurs a small performance cost to each query. Clients may choose to `collapse` only when the alias is configured to read from multiple indices. For a reference implementation of conditionally de-duping using a `collapse` query while reindexing, see: https://github.com/richkuz/schema-tools-sample-app/blob/fc60718f5784e52d55b0c009e863f8b1c8303662/demo_script.rb#L255
|
201
190
|
|
202
|
-
|
191
|
+
2. Use RBAC to deny all `UPDATE` operations during reindexing and implement continuous retries on failed `UPDATE` operations to ensure eventual consistency. This approach is suitable only for clients that can tolerate not seeing documents updated during reindexing.
|
203
192
|
|
204
|
-
|
193
|
+
Why there are duplicate updated documents during reindexing:
|
194
|
+
- The migration task configures an alias to read from both the original index and a catchup index, and write to the catchup index.
|
195
|
+
- `UPDATE` operations produce an additional document in the catchup index.
|
196
|
+
- When clients `_search` the alias for an updated document, they will see two results: one result from the original index, and one result from the catchup index.
|
205
197
|
|
206
|
-
STEP 6
|
207
198
|
|
208
|
-
|
209
|
-
- Merge the first catchup index into the new canonical index.
|
199
|
+
#### Theoretical Alternatives for UPDATE and DELETE
|
210
200
|
|
211
|
-
|
212
|
-
|
213
|
-
Configure `alias_name` so there are NO write indexes
|
214
|
-
- This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
|
215
|
-
- Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
|
216
|
-
- Clients must retry any failed writes.
|
201
|
+
In theory, the migrate task could support alternative reindexing modes when constrainted by native Elasticsearch/OpenSearch capabilities.
|
217
202
|
|
218
|
-
|
203
|
+
1. Preserve Hard-Deletes and Show All Duplicates
|
219
204
|
|
220
|
-
|
221
|
-
- Final sync to merge the second catchup index into the new canonical index.
|
205
|
+
The migrate task could support clients that require hard-deletes during reindexing by adding the new index into the alias during migration. Clients would have to use `_refresh` and `delete_by_query` when deleting documents to ensure documents are deleted from all indexes in the alias during reindexing. If using `DELETE` to delete a single document from an alias, clients might delete from the wrong index and receive a successful response containing "result: not_found". The new index would _not_ reflect such a deletion. With this approach, clients would see duplicate documents in search results for all documents during reindexing, not just updated documents. Clients could hide duplicate documents by implementing `collapse` on all searches.
|
222
206
|
|
223
|
-
|
207
|
+
2. Ignore Hard-Deletes and Hide All Duplicates
|
224
208
|
|
225
|
-
|
226
|
-
- Writes resume to the single new index. All data and deletes are consistent.
|
209
|
+
Some clients might not be able to filter out duplicate documents during reindexing. The migrate task could support such clients by not returning any INSERTED or UPDATED documents until after the reindexing completes. This approach would not support hard-deletes. To support re-updating the same document during reindexing, clients would have to find documents to upsert based on a consistent ID, not based on a changing field.
|
227
210
|
|
228
|
-
STEP 10
|
229
|
-
|
230
|
-
Close unused indexes to avoid accidental writes.
|
231
|
-
- Close `catchup1_index`
|
232
|
-
- Close `catchup2_index`
|
233
|
-
- Close `current_index`
|
234
|
-
Operation complete.
|
235
|
-
|
236
|
-
Users can safely delete closed indexes anytime after they are closed.
|
237
|
-
|
238
|
-
Caveats for clients that perform writes during the migration:
|
239
|
-
- Clients MUST retry failed creates/updates/deletes for up to a minute.
|
240
|
-
- Writes will be temporarily disabled for up to a few seconds during the procedure to ensure no data loss.
|
241
|
-
- Clients MUST use `delete_by_query` when deleting documents to ensure documents are deleted from all indexes in the alias during reindexing.
|
242
|
-
- If using `DELETE` to delete a single document from an alias, clients might delete from the wrong index and receive a successful response containing "result: not_found". The new index will _not_ reflect such a deletion.
|
243
|
-
- Clients MUST read and write to an alias, not directly to an index.
|
244
|
-
- To prevent downtime, the migration procedure only operates on aliased indexes.
|
245
|
-
- Run `rake schema:alias` to create a new alias pointed at an index.
|
246
|
-
- Client applications must read and write to alias_name instead of index_name.
|
247
211
|
|
248
212
|
### Diagnosing a failed or aborted migration
|
249
213
|
|
@@ -310,8 +274,115 @@ Run `rake 'schema:close[alias_name]'` to close all indexes in an alias.
|
|
310
274
|
|
311
275
|
Run `rake 'schema:delete[alias_name]'` to delete an alias and leave its indexes untouched.
|
312
276
|
|
277
|
+
Run `rake 'schema:drop[alias_name]'` to delete an alias (does not delete the underlying index).
|
278
|
+
|
313
279
|
GitHub Actions:
|
314
280
|
- OpenSearch Staging Close Index
|
315
281
|
- OpenSearch Production Close Index
|
316
282
|
- OpenSearch Staging Delete Index
|
317
283
|
- OpenSearch Production Delete Index
|
284
|
+
|
285
|
+
|
286
|
+
## How migrations work
|
287
|
+
|
288
|
+
When possible, `rake schema:migrate` will update settings and mappings in-place on an aliased index, without reindexing. Only breaking changes require a reindex.
|
289
|
+
|
290
|
+
Migrating breaking changes requires careful orchestration of reads and writes to ensure documents that are created/updated during the migration are not lost.
|
291
|
+
|
292
|
+
Hard-delete operations are not preserved during a breaking migration. See "Client responsibilities" above for how to mitigate this.
|
293
|
+
|
294
|
+
Use case:
|
295
|
+
- I have an alias `products` pointing at index `products-20250301000000`.
|
296
|
+
- I have heavy reads and writes with 100M+ documents in the index
|
297
|
+
- I want to reindex `products-20250301000000` into a new index and update the `products` alias to reference it, without losing any creates/updates during the process.
|
298
|
+
|
299
|
+
Rake `schema:migrate` solves this use case through the following procedure.
|
300
|
+
|
301
|
+
First, some terms:
|
302
|
+
- `alias_name`: Alias containing the index to migrate
|
303
|
+
- `products`
|
304
|
+
- `current_index`: First and only index in the alias
|
305
|
+
- `products-20250301000000`
|
306
|
+
- `new_index`: Final canonical index into which to migrate `current_index`
|
307
|
+
- `products-20250601000000`
|
308
|
+
- `catchup1_index`: Temp index to preserve writes during reindex
|
309
|
+
- `products-20250601000000-catchup-1`
|
310
|
+
- `catchup2_index`: Temp index to preserve writes while flushing `catchup1_index`
|
311
|
+
- `products-20250601000000-catchup-2`
|
312
|
+
- `log_index`: Index to log the migration state, not stored with `alias_name`
|
313
|
+
- `products-20250601000000-migration-log`
|
314
|
+
|
315
|
+
SETUP
|
316
|
+
|
317
|
+
Create `log_index` to log the migration state.
|
318
|
+
- The migration logs when it starts and completes a step along with a description.
|
319
|
+
|
320
|
+
STEP 1
|
321
|
+
|
322
|
+
Attempt to reindex 1 document to a throwaway index to catch obvious configuration errors and abort early if possible.
|
323
|
+
|
324
|
+
STEP 2
|
325
|
+
|
326
|
+
Create `catchup1_index` using the new schema.
|
327
|
+
- This index will preserve writes during the reindex.
|
328
|
+
|
329
|
+
STEP 3
|
330
|
+
|
331
|
+
Configure `alias_name` to only write to `catchup1_index` and read from `current_index` and `catchup1_index`.
|
332
|
+
|
333
|
+
STEP 4
|
334
|
+
|
335
|
+
Create `new_index` using the new schema.
|
336
|
+
|
337
|
+
Reindex `current_index` into `new_index`.
|
338
|
+
|
339
|
+
```
|
340
|
+
POST _reindex
|
341
|
+
{
|
342
|
+
"source": { "index": "#{current_index}" },
|
343
|
+
"dest": { "index": "#{new_index}" },
|
344
|
+
"conflicts": "proceed",
|
345
|
+
"refresh": false
|
346
|
+
}
|
347
|
+
```
|
348
|
+
|
349
|
+
STEP 5
|
350
|
+
|
351
|
+
Create `catchup2_index` using the new schema.
|
352
|
+
- This index ensures a place for ongoing writes while flushing `catchup1_index`.
|
353
|
+
|
354
|
+
STEP 6
|
355
|
+
|
356
|
+
Configure `alias_name` to only write to `catchup2_index` and continue reading from `current_index` and `catchup1_index`.
|
357
|
+
|
358
|
+
STEP 7
|
359
|
+
|
360
|
+
Reindex `catchup1_index` into `new_index`.
|
361
|
+
- Merge the first catchup index into the new canonical index.
|
362
|
+
|
363
|
+
STEP 8
|
364
|
+
|
365
|
+
Configure `alias_name` so there are NO write indexes
|
366
|
+
- This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
|
367
|
+
- Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
|
368
|
+
- Clients must retry any failed writes.
|
369
|
+
|
370
|
+
STEP 9
|
371
|
+
|
372
|
+
Reindex `catchup2_index` into `new_index`
|
373
|
+
- Final sync to merge the second catchup index into the new canonical index.
|
374
|
+
|
375
|
+
STEP 10
|
376
|
+
|
377
|
+
Configure `alias_name` to write to and read from `new_index` only.
|
378
|
+
- Writes resume to the single new index. All data and deletes are consistent.
|
379
|
+
|
380
|
+
STEP 11
|
381
|
+
|
382
|
+
Close unused indexes to avoid accidental writes.
|
383
|
+
- Close `catchup1_index`
|
384
|
+
- Close `catchup2_index`
|
385
|
+
- Close `current_index`
|
386
|
+
Operation complete.
|
387
|
+
|
388
|
+
Users can safely delete closed indexes anytime after they are closed.
|
data/lib/schema_tools/client.rb
CHANGED
@@ -25,9 +25,10 @@ module SchemaTools
|
|
25
25
|
class Client
|
26
26
|
attr_reader :url
|
27
27
|
|
28
|
-
def initialize(url, dryrun: false, logger: SimpleLogger.new, username: nil, password: nil)
|
28
|
+
def initialize(url, dryrun: false, interactive: false, logger: SimpleLogger.new, username: nil, password: nil)
|
29
29
|
@url = url
|
30
30
|
@dryrun = dryrun
|
31
|
+
@interactive = interactive
|
31
32
|
@logger = logger
|
32
33
|
@username = username
|
33
34
|
@password = password
|
@@ -162,19 +163,59 @@ module SchemaTools
|
|
162
163
|
end
|
163
164
|
|
164
165
|
|
165
|
-
def reindex(source_index
|
166
|
+
def reindex(source_index:, dest_index:, script: nil, size: 1000, requests_per_second: -1)
|
166
167
|
body = {
|
167
|
-
source: {
|
168
|
+
source: {
|
169
|
+
index: source_index,
|
170
|
+
size: size
|
171
|
+
},
|
168
172
|
dest: { index: dest_index },
|
169
173
|
conflicts: "proceed"
|
170
174
|
}
|
171
|
-
body[:script] = { source: script } if script
|
175
|
+
body[:script] = { lang: 'painless', source: script } if script
|
172
176
|
|
173
177
|
url = "/_reindex?wait_for_completion=false&refresh=false"
|
178
|
+
url += "&requests_per_second=#{requests_per_second}" if requests_per_second != -1
|
174
179
|
|
175
180
|
post(url, body)
|
176
181
|
end
|
177
182
|
|
183
|
+
def reindex_one_doc(source_index:, dest_index:, script: nil)
|
184
|
+
body = {
|
185
|
+
source: {
|
186
|
+
index: source_index,
|
187
|
+
query: { match_all: {} }
|
188
|
+
},
|
189
|
+
max_docs: 1,
|
190
|
+
dest: { index: dest_index },
|
191
|
+
conflicts: "proceed"
|
192
|
+
}
|
193
|
+
body[:script] = { lang: 'painless', source: script } if script
|
194
|
+
|
195
|
+
url = "/_reindex?wait_for_completion=true&refresh=true"
|
196
|
+
|
197
|
+
response = post(url, body)
|
198
|
+
|
199
|
+
if response['failures'] && !response['failures'].empty?
|
200
|
+
failure_reason = response['failures'].map { |f| f['cause']['reason'] }.join("; ")
|
201
|
+
raise "Reindex failed with internal errors. Failures: #{failure_reason}"
|
202
|
+
end
|
203
|
+
|
204
|
+
total = response['total'].to_i
|
205
|
+
created = response['created'].to_i
|
206
|
+
updated = response['updated'].to_i
|
207
|
+
|
208
|
+
if total != 1
|
209
|
+
raise "Reindex query found #{total} documents. Expected to find 1."
|
210
|
+
elsif created + updated != 1
|
211
|
+
raise "Reindex failed to index the document (created: #{created}, updated: #{updated}). Noops: #{response.fetch('noops', 0)}."
|
212
|
+
elsif response['timed_out'] == true
|
213
|
+
raise "Reindex operation timed out."
|
214
|
+
end
|
215
|
+
|
216
|
+
response
|
217
|
+
end
|
218
|
+
|
178
219
|
def get_task_status(task_id)
|
179
220
|
get("/_tasks/#{task_id}")
|
180
221
|
end
|
@@ -372,6 +413,10 @@ module SchemaTools
|
|
372
413
|
task_status = get_task_status(task_id)
|
373
414
|
|
374
415
|
if task_status['completed']
|
416
|
+
if task_status['error']
|
417
|
+
log "ERROR: Task #{task_id} failed with a top-level error: #{task_status['error']}"
|
418
|
+
raise task_status['error']['reason']
|
419
|
+
end
|
375
420
|
return task_status
|
376
421
|
end
|
377
422
|
|
@@ -435,6 +480,17 @@ module SchemaTools
|
|
435
480
|
settings.dig('index', 'verified_before_close') == 'true'
|
436
481
|
end
|
437
482
|
|
483
|
+
def refresh(index_name, suppress_logging: false)
|
484
|
+
post("/#{index_name}/_refresh", {}, suppress_logging: suppress_logging)
|
485
|
+
end
|
486
|
+
|
487
|
+
# For this to work reliably, segments MUST be flushed.
|
488
|
+
# Call refresh(index_name) first!
|
489
|
+
def delete_by_query(index_name, query, suppress_logging: false)
|
490
|
+
body = { query: query }
|
491
|
+
post("/#{index_name}/_delete_by_query", body, suppress_logging: suppress_logging)
|
492
|
+
end
|
493
|
+
|
438
494
|
private
|
439
495
|
|
440
496
|
def make_http_request(uri)
|
@@ -461,7 +517,7 @@ module SchemaTools
|
|
461
517
|
end
|
462
518
|
|
463
519
|
def interactive_mode?
|
464
|
-
|
520
|
+
@interactive
|
465
521
|
end
|
466
522
|
|
467
523
|
def await_user_input
|
@@ -8,7 +8,7 @@ require_relative '../api_aware_mappings_diff'
|
|
8
8
|
require 'json'
|
9
9
|
|
10
10
|
module SchemaTools
|
11
|
-
def self.migrate_all(client:)
|
11
|
+
def self.migrate_all(client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
|
12
12
|
puts "Discovering all schemas and migrating each to their latest revisions..."
|
13
13
|
|
14
14
|
schemas = SchemaFiles.discover_all_schemas
|
@@ -26,7 +26,7 @@ module SchemaTools
|
|
26
26
|
|
27
27
|
schemas.each do |alias_name|
|
28
28
|
begin
|
29
|
-
migrate_one_schema(alias_name: alias_name, client: client)
|
29
|
+
migrate_one_schema(alias_name: alias_name, client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
|
30
30
|
rescue => e
|
31
31
|
puts "✗ Migration failed for #{alias_name}: #{e.message}"
|
32
32
|
raise e
|
@@ -35,7 +35,7 @@ module SchemaTools
|
|
35
35
|
end
|
36
36
|
end
|
37
37
|
|
38
|
-
def self.migrate_one_schema(alias_name:, client:)
|
38
|
+
def self.migrate_one_schema(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
|
39
39
|
puts "=" * 60
|
40
40
|
puts "Migrating alias #{alias_name}"
|
41
41
|
puts "=" * 60
|
@@ -84,7 +84,7 @@ module SchemaTools
|
|
84
84
|
puts "✗ Failed to update index '#{index_name}': #{e.message}"
|
85
85
|
puts "This appears to be a breaking change. Starting breaking change migration..."
|
86
86
|
|
87
|
-
MigrateBreakingChange.migrate(alias_name:, client:)
|
87
|
+
MigrateBreakingChange.migrate(alias_name:, client:, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
|
88
88
|
end
|
89
89
|
end
|
90
90
|
end
|
@@ -43,13 +43,15 @@ module SchemaTools
|
|
43
43
|
end
|
44
44
|
|
45
45
|
class MigrateBreakingChange
|
46
|
-
def self.migrate(alias_name:, client:)
|
47
|
-
new(alias_name: alias_name, client: client).migrate
|
46
|
+
def self.migrate(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
|
47
|
+
new(alias_name: alias_name, client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second).migrate
|
48
48
|
end
|
49
49
|
|
50
|
-
def initialize(alias_name:, client:)
|
50
|
+
def initialize(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
|
51
51
|
@alias_name = alias_name
|
52
52
|
@client = client
|
53
|
+
@reindex_batch_size = reindex_batch_size
|
54
|
+
@reindex_requests_per_second = reindex_requests_per_second
|
53
55
|
@migration_log_index = nil
|
54
56
|
@current_step = nil
|
55
57
|
@rollback_attempted = false
|
@@ -107,6 +109,9 @@ module SchemaTools
|
|
107
109
|
@catchup2_index = "#{@new_index}-catchup-2"
|
108
110
|
log "catchup2_index: #{@catchup2_index}"
|
109
111
|
|
112
|
+
@throwaway_test_index = "#{@new_index}-throwaway-test"
|
113
|
+
log "throwaway_test_index: #{@throwaway_test_index}}"
|
114
|
+
|
110
115
|
# Use current index settings and mappings when creating catchup indexes
|
111
116
|
# so that any reindex painless script logic will apply correctly to them.
|
112
117
|
@current_settings = @client.get_index_settings(@current_index)
|
@@ -136,6 +141,10 @@ module SchemaTools
|
|
136
141
|
|
137
142
|
def migration_steps
|
138
143
|
[
|
144
|
+
MigrationStep.new(
|
145
|
+
name: "STEP 0: Pre-test reindex with 1 document",
|
146
|
+
run: ->(logger) { step0_test_reindex_one_doc }
|
147
|
+
),
|
139
148
|
MigrationStep.new(
|
140
149
|
name: "STEP 1: Create catchup-1 index",
|
141
150
|
run: ->(logger) { step1_create_catchup1 }
|
@@ -179,6 +188,19 @@ module SchemaTools
|
|
179
188
|
]
|
180
189
|
end
|
181
190
|
|
191
|
+
def step0_test_reindex_one_doc
|
192
|
+
@client.create_index(@throwaway_test_index, @new_settings, @new_mappings)
|
193
|
+
begin
|
194
|
+
@client.reindex_one_doc(source_index: @current_index, dest_index: @throwaway_test_index, script: @reindex_script)
|
195
|
+
rescue => e
|
196
|
+
log "Failed reindexing a test document"
|
197
|
+
raise e
|
198
|
+
ensure
|
199
|
+
log "Deleting throwaway test index #{@throwaway_test_index}"
|
200
|
+
@client.delete_index(@throwaway_test_index)
|
201
|
+
end
|
202
|
+
end
|
203
|
+
|
182
204
|
def step1_create_catchup1
|
183
205
|
@client.create_index(@catchup1_index, @current_settings, @current_mappings)
|
184
206
|
log "Created catchup-1 index: #{@catchup1_index}"
|
@@ -226,24 +248,38 @@ module SchemaTools
|
|
226
248
|
end
|
227
249
|
|
228
250
|
def reindex(current_index, new_index, reindex_script)
|
229
|
-
|
230
|
-
log
|
231
|
-
|
232
|
-
|
233
|
-
|
251
|
+
task_response = @client.reindex(source_index: current_index, dest_index: new_index, script: reindex_script, size: @reindex_batch_size, requests_per_second: @reindex_requests_per_second)
|
252
|
+
log task_response
|
253
|
+
if task_response['took']
|
254
|
+
log "Reindex task complete. Took: #{task_response['took']}"
|
255
|
+
if task_response['failures'] && !task_response['failures'].empty?
|
256
|
+
failure_reason = task_response['failures'].map { |f| f['cause']['reason'] }.join("; ")
|
257
|
+
raise "Reindex failed synchronously with internal errors. Failures: #{failure_reason}"
|
258
|
+
end
|
234
259
|
return true
|
235
260
|
end
|
236
|
-
|
237
|
-
task_id
|
238
|
-
|
239
|
-
raise "No task ID from reindex. Reindex incomplete."
|
261
|
+
task_id = task_response['task']
|
262
|
+
unless task_id
|
263
|
+
raise "Reindex response did not contain 'task' ID or 'took' time. Reindex incomplete."
|
240
264
|
end
|
241
|
-
|
242
265
|
log "Reindex task started at #{Time.now}. task_id is #{task_id}. Fetch task status with GET #{@client.url}/_tasks/#{task_id}"
|
243
266
|
|
244
267
|
timeout = 604800 # 1 week
|
245
|
-
@client.wait_for_task(
|
246
|
-
|
268
|
+
completed_task_status = @client.wait_for_task(task_response['task'], timeout)
|
269
|
+
final_result = completed_task_status.fetch('response', {})
|
270
|
+
if final_result['failures'] && !final_result['failures'].empty?
|
271
|
+
failure_reason = final_result['failures'].map { |f| f['cause']['reason'] }.join("; ")
|
272
|
+
raise "Reindex FAILED during async processing. Failures: #{failure_reason}"
|
273
|
+
end
|
274
|
+
created = final_result.fetch('created', 0)
|
275
|
+
updated = final_result.fetch('updated', 0)
|
276
|
+
deleted = final_result.fetch('deleted', 0)
|
277
|
+
log "Reindex complete." + \
|
278
|
+
"\nTook: #{final_result['took']}ms." + \
|
279
|
+
"\nCreated: #{created}" + \
|
280
|
+
"\nUpdated: #{updated}" + \
|
281
|
+
"\nDeleted: #{deleted}"
|
282
|
+
return true
|
247
283
|
end
|
248
284
|
|
249
285
|
def step4_create_catchup2
|
@@ -105,7 +105,7 @@ module SchemaTools
|
|
105
105
|
@logger.log "📊 Found #{doc_count} documents in catchup-1 index - reindexing to original..."
|
106
106
|
|
107
107
|
# Reindex from catchup-1 to original index
|
108
|
-
response = @client.reindex(@catchup1_index, @current_index, nil)
|
108
|
+
response = @client.reindex(source_index: @catchup1_index, dest_index: @current_index, script: nil)
|
109
109
|
@logger.log "Reindex task started - task_id: #{response['task']}"
|
110
110
|
|
111
111
|
# Wait for reindex to complete
|
@@ -26,6 +26,9 @@ module SchemaTools
|
|
26
26
|
sample_settings = {
|
27
27
|
"number_of_shards" => 1,
|
28
28
|
"number_of_replicas" => 0,
|
29
|
+
"replication": {
|
30
|
+
"type": "DOCUMENT"
|
31
|
+
},
|
29
32
|
"analysis" => {
|
30
33
|
"analyzer" => {
|
31
34
|
"default" => {
|
@@ -59,13 +62,33 @@ module SchemaTools
|
|
59
62
|
|
60
63
|
settings_file = File.join(schema_path, 'settings.json')
|
61
64
|
mappings_file = File.join(schema_path, 'mappings.json')
|
65
|
+
reindex_file = File.join(schema_path, 'reindex.painless')
|
62
66
|
|
63
67
|
File.write(settings_file, JSON.pretty_generate(sample_settings))
|
64
68
|
File.write(mappings_file, JSON.pretty_generate(sample_mappings))
|
65
69
|
|
70
|
+
# Create example reindex.painless file
|
71
|
+
reindex_content = <<~PAINLESS
|
72
|
+
// Example reindex script for transforming data during migration
|
73
|
+
// Modify this script to transform your data as needed
|
74
|
+
//
|
75
|
+
// Example: Rename a field
|
76
|
+
// if (ctx._source.containsKey('old_field_name')) {
|
77
|
+
// ctx._source.new_field_name = ctx._source.old_field_name;
|
78
|
+
// ctx._source.remove('old_field_name');
|
79
|
+
// }
|
80
|
+
//
|
81
|
+
// Example: Add a new field
|
82
|
+
// ctx._source.new_field = 'default_value';
|
83
|
+
long timestamp = System.currentTimeMillis();
|
84
|
+
PAINLESS
|
85
|
+
|
86
|
+
File.write(reindex_file, reindex_content)
|
87
|
+
|
66
88
|
puts "✓ Sample schema created at #{schema_path}"
|
67
89
|
puts " - settings.json"
|
68
90
|
puts " - mappings.json"
|
91
|
+
puts " - reindex.painless"
|
69
92
|
end
|
70
93
|
|
71
94
|
def self.create_alias_for_index(client:)
|
@@ -142,13 +165,32 @@ module SchemaTools
|
|
142
165
|
|
143
166
|
settings_file = File.join(schema_path, 'settings.json')
|
144
167
|
mappings_file = File.join(schema_path, 'mappings.json')
|
168
|
+
reindex_file = File.join(schema_path, 'reindex.painless')
|
145
169
|
|
146
170
|
File.write(settings_file, JSON.pretty_generate(filtered_settings))
|
147
171
|
File.write(mappings_file, JSON.pretty_generate(mappings))
|
148
172
|
|
173
|
+
# Create example reindex.painless file
|
174
|
+
reindex_content = <<~PAINLESS
|
175
|
+
# Example reindex script for transforming data during migration
|
176
|
+
# Modify this script to transform your data as needed
|
177
|
+
#
|
178
|
+
# Example: Rename a field
|
179
|
+
# if (ctx._source.containsKey('old_field_name')) {
|
180
|
+
# ctx._source.new_field_name = ctx._source.old_field_name;
|
181
|
+
# ctx._source.remove('old_field_name');
|
182
|
+
# }
|
183
|
+
#
|
184
|
+
# Example: Add a new field
|
185
|
+
# ctx._source.new_field = 'default_value';
|
186
|
+
PAINLESS
|
187
|
+
|
188
|
+
File.write(reindex_file, reindex_content)
|
189
|
+
|
149
190
|
puts "✓ Schema downloaded to #{schema_path}"
|
150
191
|
puts " - settings.json"
|
151
192
|
puts " - mappings.json"
|
193
|
+
puts " - reindex.painless"
|
152
194
|
end
|
153
195
|
|
154
196
|
end
|
data/lib/tasks/schema.rake
CHANGED
@@ -32,6 +32,7 @@ def create_client!
|
|
32
32
|
client = SchemaTools::Client.new(
|
33
33
|
SchemaTools::Config.connection_url,
|
34
34
|
dryrun: ENV['DRYRUN'] == 'true',
|
35
|
+
interactive: ENV['INTERACTIVE'] == 'true',
|
35
36
|
username: SchemaTools::Config.connection_username,
|
36
37
|
password: SchemaTools::Config.connection_password
|
37
38
|
)
|
@@ -51,11 +52,14 @@ namespace :schema do
|
|
51
52
|
desc "Migrate to a specific alias schema or migrate all schemas to their latest revisions"
|
52
53
|
task :migrate, [:alias_name] do |t, args|
|
53
54
|
client = create_client!
|
54
|
-
|
55
|
+
|
56
|
+
reindex_batch_size = ENV['REINDEX_BATCH_SIZE'] ? ENV['REINDEX_BATCH_SIZE'].to_i : 1000
|
57
|
+
reindex_requests_per_second = ENV['REINDEX_REQUESTS_PER_SECOND'] ? ENV['REINDEX_REQUESTS_PER_SECOND'].to_i : -1
|
58
|
+
|
55
59
|
if args[:alias_name]
|
56
|
-
SchemaTools.migrate_one_schema(alias_name: args[:alias_name], client: client)
|
60
|
+
SchemaTools.migrate_one_schema(alias_name: args[:alias_name], client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
|
57
61
|
else
|
58
|
-
SchemaTools.migrate_all(client: client)
|
62
|
+
SchemaTools.migrate_all(client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
|
59
63
|
end
|
60
64
|
end
|
61
65
|
|
@@ -91,6 +95,30 @@ namespace :schema do
|
|
91
95
|
)
|
92
96
|
end
|
93
97
|
|
98
|
+
desc "Delete an alias (does not delete the index)"
|
99
|
+
task :drop, [:alias_name] do |t, args|
|
100
|
+
client = create_client!
|
101
|
+
|
102
|
+
unless args[:alias_name]
|
103
|
+
puts "Error: alias_name is required"
|
104
|
+
puts "Usage: rake 'schema:drop[alias_name]'"
|
105
|
+
exit 1
|
106
|
+
end
|
107
|
+
|
108
|
+
alias_name = args[:alias_name]
|
109
|
+
|
110
|
+
unless client.alias_exists?(alias_name)
|
111
|
+
puts "Error: Alias '#{alias_name}' does not exist"
|
112
|
+
exit 1
|
113
|
+
end
|
114
|
+
|
115
|
+
indices = client.get_alias_indices(alias_name)
|
116
|
+
puts "Deleting alias '#{alias_name}' from indices: #{indices.join(', ')}"
|
117
|
+
|
118
|
+
client.delete_alias(alias_name)
|
119
|
+
puts "✓ Alias '#{alias_name}' deleted successfully"
|
120
|
+
end
|
121
|
+
|
94
122
|
desc "Download schema from an existing alias or index"
|
95
123
|
task :download do |t, args|
|
96
124
|
client = create_client!
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: schema-tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.6
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Rich Kuzsma
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2025-10-
|
11
|
+
date: 2025-10-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
@@ -138,7 +138,6 @@ files:
|
|
138
138
|
- bin/setup
|
139
139
|
- lib/schema_tools.rb
|
140
140
|
- lib/schema_tools/api_aware_mappings_diff.rb
|
141
|
-
- lib/schema_tools/catchup.rb
|
142
141
|
- lib/schema_tools/client.rb
|
143
142
|
- lib/schema_tools/close.rb
|
144
143
|
- lib/schema_tools/config.rb
|
data/lib/schema_tools/catchup.rb
DELETED
@@ -1,23 +0,0 @@
|
|
1
|
-
module SchemaTools
|
2
|
-
def self.catchup(index_name:, client:)
|
3
|
-
raise "index_name parameter is required" unless index_name
|
4
|
-
|
5
|
-
index_config = SchemaFiles.get_index_config(index_name)
|
6
|
-
raise "Index configuration not found for #{index_name}" unless index_config
|
7
|
-
|
8
|
-
from_index = index_config['from_index_name']
|
9
|
-
raise "from_index_name not specified in index configuration" unless from_index
|
10
|
-
|
11
|
-
unless client.index_exists?(from_index)
|
12
|
-
raise "Source index #{from_index} does not exist. Cannot perform catchup reindex to #{index_name}."
|
13
|
-
end
|
14
|
-
|
15
|
-
reindex_script = SchemaFiles.get_reindex_script(index_name)
|
16
|
-
|
17
|
-
puts "Starting catchup reindex from #{from_index} to #{index_name}"
|
18
|
-
# TODO NOT IMPLEMENTED YET
|
19
|
-
# Do a reindex by query
|
20
|
-
puts "TODO IMPLEMENT ME"
|
21
|
-
response = client.reindex(from_index, index_name, reindex_script)
|
22
|
-
end
|
23
|
-
end
|