RubyGems - schema-tools - Versions diffs - 1.0.5 → 1.0.6 - Mend

schema-tools 1.0.5 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +151 -80
data/lib/schema_tools/client.rb +61 -5
data/lib/schema_tools/migrate/migrate.rb +4 -4
data/lib/schema_tools/migrate/migrate_breaking_change.rb +51 -15
data/lib/schema_tools/migrate/rollback.rb +1 -1
data/lib/schema_tools/new_alias.rb +42 -0
data/lib/tasks/schema.rake +31 -3
metadata +2 -3
data/lib/schema_tools/catchup.rb +0 -23

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d7eb15996fbabd0923cf28e8ecd2142a28c3af1fb7ce0373220d64470f9f2aee
-  data.tar.gz: 6806c8094271fb77e1e61c4bb98f3efc7d670fb40652f75c642f30445ee19adc
+  metadata.gz: 3a977e1d5ae35640094087dff8eb1b1e5481db2b6bc0d35315f219fea02210d6
+  data.tar.gz: d8a40fbc727ede8614d607a22ca9aa830b32254598f8900d8f0e0b57fc315589
 SHA512:
-  metadata.gz: 661f85da42d258a77599d4e4fe4165d8db11d9b235cedc7af77d1cbc48f41a6297f5a7c8449720b3f51ae02ac67ddefbef68ab0a777bb6425adaddefc567203c
-  data.tar.gz: a8bf7e23038a9059ad52faa347e58f6a5a4e50d228699a98a90a3d51c1f88c2be108d29e22370745e3a43979b6528af73dbf8dcdce580d5b042cd5d181a3d980
+  metadata.gz: 07d90a197672dff508c6db4e6e40d90da18ce06dad381b0f46c6687d3e1d38eb2f36a305d06717863da6f5f60fcb4cdae25d9df4a5862a54338887a819255924
+  data.tar.gz: cd758d32d70d00af78c8f9580ff66a02f7c4bc86de7dde7306f90eaa7f4ae88dd431987adbfd984bd7e8f12b5d9f648c66c89729443b47a87204f61b4fc65dd2

data/README.md CHANGED Viewed

@@ -47,6 +47,17 @@ export OPENSEARCH_PASSWORD=your_password
 rake -T | grep " schema:"
 ```
+Available schema tasks:
+- `schema:migrate[alias_name]` - Migrate to a specific alias schema or migrate all schemas
+- `schema:new` - Create a new alias with sample schema
+- `schema:close[name]` - Close an index or alias
+- `schema:delete[name]` - Hard delete an index (only works on closed indexes) or delete an alias
+- `schema:drop[alias_name]` - Delete an alias (does not delete the index)
+- `schema:download` - Download schema from an existing alias or index
+- `schema:alias` - Create an alias for an existing index
+- `schema:seed` - Seed data to a live index
+- `schema:diff` - Compare all schemas to their corresponding downloaded alias settings and mappings
 ### Download an existing schema
 Run `rake schema:download` to download a schema from an existing alias or index:
@@ -136,114 +147,67 @@ Use `INTERACTIVE` to prompt to proceed before applying any POST/PUT/DELETE opera
 INTERACTIVE=true rake schema:migrate
 ```
+Use `REINDEX_BATCH_SIZE` to control the batch size for reindexing operations (default: 1000):
-## How migrations work
+```
+REINDEX_BATCH_SIZE=500 rake schema:migrate
+```
-When possible, `rake schema:migrate` will update settings and mappings in-place on an aliased index, without reindexing. Only breaking changes require a reindex.
+Use `REINDEX_REQUESTS_PER_SECOND` to throttle reindexing operations (default: -1, no throttling):
-Migrating breaking changes requires careful orchestration of reads and writes to ensure documents that are created/updated/deleted during the migration are not lost.
+```
+REINDEX_REQUESTS_PER_SECOND=100 rake schema:migrate
+```
-Use case:
-- I have an alias `products` pointing at index `products-20250301000000`.
-- I have heavy reads and writes with 100M+ documents in the index
-- I want to reindex `products-20250301000000` into a new index and update the `products` alias to reference it, without losing any creates/updates/deletes during the process.
-Rake `schema:migrate` solves this use case through the following procedure.
+## Client responsibilities during breaking migrations
-First, some terms:
-- `alias_name`: Alias containing the index to migrate
-	- `products`
-- `current_index`: First and only index in the alias
-	- `products-20250301000000`
-- `new_index`: Final canonical index into which to migrate `current_index`
-	- `products-20250601000000`
-- `catchup1_index`: Temp index to preserve writes during reindex
-	- `products-20250601000000-catchup-1`
-- `catchup2_index`: Temp index to preserve writes while flushing `catchup1_index`
-	- `products-20250601000000-catchup-2`
-- `log_index`: Index to log the migration state, not stored with `alias_name`
-	- `products-20250601000000-migration-log`
+#### Clients MUST retry failed creates/updates/deletes for up to ~ 1 minute.
-SETUP
+Writes will be temporarily disabled for a few seconds during the procedure to prevent data loss.
-Create `log_index` to log the migration state.
-- The migration logs when it starts and completes a step along with a description.
+#### Clients MUST read and write to an **alias**. Clients must NOT write directly to an **index**.
-STEP 1
+To prevent downtime, the migration procedure only operates on aliased indexes.
-Create `catchup1_index` using the new schema.
-- This index will preserve writes during the reindex.
+Run `rake schema:alias` to create a new alias pointed at an index.
-STEP 2
+#### Hard-deletes during reindexing will NOT affect the migrated index.
-Configure `alias_name` to only write to `catchup1_index` and read from `current_index` and `catchup1_index`.
+Clients can mitigate the lack of hard-delete support two ways:
-STEP 3
+1. (Recommended) Implement soft-deletes (e.g. set `deleted_at`) with a recurring hard-delete job. Run the hard-delete job after reindexing.
-Create `new_index` using the new schema.
+2. Use RBAC to deny all `DELETE` operations during reindexing and implement continuous retries on failed `DELETE` operations to ensure eventual consistency.
-Reindex `current_index` into `new_index`.
+#### During reindexing, searches will return **duplicate results** for updated documents.
-```
-POST _reindex
-{
-  "source": { "index": "#{current_index}" },
-  "dest": { "index": "#{new_index}" },
-  "conflicts": "proceed",
-  "refresh": false
-}
-```
+After reindexing, only the latest update will appear in search results.
-STEP 4
+Clients can mitigate seeing duplicate documents in two ways:
-Create `catchup2_index` using the new schema.
-- This index ensures a place for ongoing writes while flushing `catchup1_index`.
+1. (Recommended) Clients may hide duplicate documents by implementing `collapse` on all searches. `collapse` incurs a small performance cost to each query. Clients may choose to `collapse` only when the alias is configured to read from multiple indices. For a reference implementation of conditionally de-duping using a `collapse` query while reindexing, see: https://github.com/richkuz/schema-tools-sample-app/blob/fc60718f5784e52d55b0c009e863f8b1c8303662/demo_script.rb#L255
-STEP 5
+2. Use RBAC to deny all `UPDATE` operations during reindexing and implement continuous retries on failed `UPDATE` operations to ensure eventual consistency. This approach is suitable only for clients that can tolerate not seeing documents updated during reindexing.
-Configure `alias_name` to only write to `catchup2_index` and continue reading from `current_index` and `catchup1_index`.
+Why there are duplicate updated documents during reindexing:
+- The migration task configures an alias to read from both the original index and a catchup index, and write to the catchup index.
+- `UPDATE` operations produce an additional document in the catchup index.
+- When clients `_search` the alias for an updated document, they will see two results: one result from the original index, and one result from the catchup index.
-STEP 6
-Reindex `catchup1_index` into `new_index`.
-- Merge the first catchup index into the new canonical index.
+#### Theoretical Alternatives for UPDATE and DELETE
-STEP 7
-Configure `alias_name` so there are NO write indexes
-- This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
-- Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
-- Clients must retry any failed writes.
+In theory, the migrate task could support alternative reindexing modes when constrainted by native Elasticsearch/OpenSearch capabilities.
-STEP 8
+1. Preserve Hard-Deletes and Show All Duplicates
-Reindex `catchup2_index` into `new_index`
-- Final sync to merge the second catchup index into the new canonical index.
+The migrate task could support clients that require hard-deletes during reindexing by adding the new index into the alias during migration. Clients would have to use `_refresh` and `delete_by_query` when deleting documents to ensure documents are deleted from all indexes in the alias during reindexing. If using `DELETE` to delete a single document from an alias, clients might delete from the wrong index and receive a successful response containing "result: not_found". The new index would _not_ reflect such a deletion. With this approach, clients would see duplicate documents in search results for all documents during reindexing, not just updated documents. Clients could hide duplicate documents by implementing `collapse` on all searches.
-STEP 9
+2. Ignore Hard-Deletes and Hide All Duplicates
-Configure `alias_name` to write to and read from `new_index` only.
-- Writes resume to the single new index. All data and deletes are consistent.
+Some clients might not be able to filter out duplicate documents during reindexing. The migrate task could support such clients by not returning any INSERTED or UPDATED documents until after the reindexing completes. This approach would not support hard-deletes. To support re-updating the same document during reindexing, clients would have to find documents to upsert based on a consistent ID, not based on a changing field.
-STEP 10
-Close unused indexes to avoid accidental writes.
-- Close `catchup1_index`
-- Close `catchup2_index`
-- Close `current_index`
-Operation complete.
-Users can safely delete closed indexes anytime after they are closed.
-Caveats for clients that perform writes during the migration:
-- Clients MUST retry failed creates/updates/deletes for up to a minute.
-	- Writes will be temporarily disabled for up to a few seconds during the procedure to ensure no data loss.
-- Clients MUST use `delete_by_query` when deleting documents to ensure documents are deleted from all indexes in the alias during reindexing.
-	- If using `DELETE` to delete a single document from an alias, clients might delete from the wrong index and receive a successful response containing "result: not_found". The new index will _not_ reflect such a deletion.
-- Clients MUST read and write to an alias, not directly to an index.
-	- To prevent downtime, the migration procedure only operates on aliased indexes.
-	- Run `rake schema:alias` to create a new alias pointed at an index.
-	- Client applications must read and write to alias_name instead of index_name.
 ### Diagnosing a failed or aborted migration
@@ -310,8 +274,115 @@ Run `rake 'schema:close[alias_name]'` to close all indexes in an alias.
 Run `rake 'schema:delete[alias_name]'` to delete an alias and leave its indexes untouched.
+Run `rake 'schema:drop[alias_name]'` to delete an alias (does not delete the underlying index).
 GitHub Actions:
 - OpenSearch Staging Close Index
 - OpenSearch Production Close Index
 - OpenSearch Staging Delete Index
 - OpenSearch Production Delete Index
+## How migrations work
+When possible, `rake schema:migrate` will update settings and mappings in-place on an aliased index, without reindexing. Only breaking changes require a reindex.
+Migrating breaking changes requires careful orchestration of reads and writes to ensure documents that are created/updated during the migration are not lost.
+Hard-delete operations are not preserved during a breaking migration. See "Client responsibilities" above for how to mitigate this.
+Use case:
+- I have an alias `products` pointing at index `products-20250301000000`.
+- I have heavy reads and writes with 100M+ documents in the index
+- I want to reindex `products-20250301000000` into a new index and update the `products` alias to reference it, without losing any creates/updates during the process.
+Rake `schema:migrate` solves this use case through the following procedure.
+First, some terms:
+- `alias_name`: Alias containing the index to migrate
+	- `products`
+- `current_index`: First and only index in the alias
+	- `products-20250301000000`
+- `new_index`: Final canonical index into which to migrate `current_index`
+	- `products-20250601000000`
+- `catchup1_index`: Temp index to preserve writes during reindex
+	- `products-20250601000000-catchup-1`
+- `catchup2_index`: Temp index to preserve writes while flushing `catchup1_index`
+	- `products-20250601000000-catchup-2`
+- `log_index`: Index to log the migration state, not stored with `alias_name`
+	- `products-20250601000000-migration-log`
+SETUP
+Create `log_index` to log the migration state.
+- The migration logs when it starts and completes a step along with a description.
+STEP 1
+Attempt to reindex 1 document to a throwaway index to catch obvious configuration errors and abort early if possible.
+STEP 2
+Create `catchup1_index` using the new schema.
+- This index will preserve writes during the reindex.
+STEP 3
+Configure `alias_name` to only write to `catchup1_index` and read from `current_index` and `catchup1_index`.
+STEP 4
+Create `new_index` using the new schema.
+Reindex `current_index` into `new_index`.
+```
+POST _reindex
+{
+  "source": { "index": "#{current_index}" },
+  "dest": { "index": "#{new_index}" },
+  "conflicts": "proceed",
+  "refresh": false
+}
+```
+STEP 5
+Create `catchup2_index` using the new schema.
+- This index ensures a place for ongoing writes while flushing `catchup1_index`.
+STEP 6
+Configure `alias_name` to only write to `catchup2_index` and continue reading from `current_index` and `catchup1_index`.
+STEP 7
+Reindex `catchup1_index` into `new_index`.
+- Merge the first catchup index into the new canonical index.
+STEP 8
+Configure `alias_name` so there are NO write indexes
+- This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
+- Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
+- Clients must retry any failed writes.
+STEP 9
+Reindex `catchup2_index` into `new_index`
+- Final sync to merge the second catchup index into the new canonical index.
+STEP 10
+Configure `alias_name` to write to and read from `new_index` only.
+- Writes resume to the single new index. All data and deletes are consistent.
+STEP 11
+Close unused indexes to avoid accidental writes.
+- Close `catchup1_index`
+- Close `catchup2_index`
+- Close `current_index`
+Operation complete.
+Users can safely delete closed indexes anytime after they are closed.

data/lib/schema_tools/client.rb CHANGED Viewed

@@ -25,9 +25,10 @@ module SchemaTools
   class Client
     attr_reader :url
-    def initialize(url, dryrun: false, logger: SimpleLogger.new, username: nil, password: nil)
+    def initialize(url, dryrun: false, interactive: false, logger: SimpleLogger.new, username: nil, password: nil)
       @url = url
       @dryrun = dryrun
+      @interactive = interactive
       @logger = logger
       @username = username
       @password = password
@@ -162,19 +163,59 @@ module SchemaTools
     end
-    def reindex(source_index, dest_index, script = nil)
+    def reindex(source_index:, dest_index:, script: nil, size: 1000, requests_per_second: -1)
       body = {
-        source: { index: source_index },
+        source: {
+          index: source_index,
+          size: size
+        },
         dest: { index: dest_index },
         conflicts: "proceed"
       }
-      body[:script] = { source: script } if script
+      body[:script] = { lang: 'painless', source: script } if script
       url = "/_reindex?wait_for_completion=false&refresh=false"
+      url += "&requests_per_second=#{requests_per_second}" if requests_per_second != -1
       post(url, body)
     end
+    def reindex_one_doc(source_index:, dest_index:, script: nil)
+      body = {
+        source: {
+          index: source_index,
+          query: { match_all: {} }
+        },
+        max_docs: 1,
+        dest: { index: dest_index },
+        conflicts: "proceed"
+      }
+      body[:script] = { lang: 'painless', source: script } if script
+      url = "/_reindex?wait_for_completion=true&refresh=true"
+      response = post(url, body)
+      if response['failures'] && !response['failures'].empty?
+        failure_reason = response['failures'].map { |f| f['cause']['reason'] }.join("; ")
+        raise "Reindex failed with internal errors. Failures: #{failure_reason}"
+      end
+      total = response['total'].to_i
+      created = response['created'].to_i
+      updated = response['updated'].to_i
+      if total != 1
+        raise "Reindex query found #{total} documents. Expected to find 1."
+      elsif created + updated != 1
+        raise "Reindex failed to index the document (created: #{created}, updated: #{updated}). Noops: #{response.fetch('noops', 0)}."
+      elsif response['timed_out'] == true
+        raise "Reindex operation timed out."
+      end
+      response
+    end
     def get_task_status(task_id)
       get("/_tasks/#{task_id}")
     end
@@ -372,6 +413,10 @@ module SchemaTools
         task_status = get_task_status(task_id)
         if task_status['completed']
+          if task_status['error']
+            log "ERROR: Task #{task_id} failed with a top-level error: #{task_status['error']}"
+            raise task_status['error']['reason']
+          end
           return task_status
         end
@@ -435,6 +480,17 @@ module SchemaTools
       settings.dig('index', 'verified_before_close') == 'true'
     end
+    def refresh(index_name, suppress_logging: false)
+      post("/#{index_name}/_refresh", {}, suppress_logging: suppress_logging)
+    end
+    # For this to work reliably, segments MUST be flushed.
+    # Call refresh(index_name) first!
+    def delete_by_query(index_name, query, suppress_logging: false)
+      body = { query: query }
+      post("/#{index_name}/_delete_by_query", body, suppress_logging: suppress_logging)
+    end
     private
     def make_http_request(uri)
@@ -461,7 +517,7 @@ module SchemaTools
     end
     def interactive_mode?
-      ENV['INTERACTIVE'] == 'true'
+      @interactive
     end
     def await_user_input

data/lib/schema_tools/migrate/migrate.rb CHANGED Viewed

@@ -8,7 +8,7 @@ require_relative '../api_aware_mappings_diff'
 require 'json'
 module SchemaTools
-  def self.migrate_all(client:)
+  def self.migrate_all(client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
     puts "Discovering all schemas and migrating each to their latest revisions..."
     schemas = SchemaFiles.discover_all_schemas
@@ -26,7 +26,7 @@ module SchemaTools
     schemas.each do |alias_name|
       begin
-        migrate_one_schema(alias_name: alias_name, client: client)
+        migrate_one_schema(alias_name: alias_name, client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
       rescue => e
         puts "✗ Migration failed for #{alias_name}: #{e.message}"
         raise e
@@ -35,7 +35,7 @@ module SchemaTools
     end
   end
-  def self.migrate_one_schema(alias_name:, client:)
+  def self.migrate_one_schema(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
     puts "=" * 60
     puts "Migrating alias #{alias_name}"
     puts "=" * 60
@@ -84,7 +84,7 @@ module SchemaTools
       puts "✗ Failed to update index '#{index_name}': #{e.message}"
       puts "This appears to be a breaking change. Starting breaking change migration..."
-      MigrateBreakingChange.migrate(alias_name:, client:)
+      MigrateBreakingChange.migrate(alias_name:, client:, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
     end
   end
 end

data/lib/schema_tools/migrate/migrate_breaking_change.rb CHANGED Viewed

@@ -43,13 +43,15 @@ module SchemaTools
   end
   class MigrateBreakingChange
-    def self.migrate(alias_name:, client:)
-      new(alias_name: alias_name, client: client).migrate
+    def self.migrate(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
+      new(alias_name: alias_name, client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second).migrate
     end
-    def initialize(alias_name:, client:)
+    def initialize(alias_name:, client:, reindex_batch_size: 1000, reindex_requests_per_second: -1)
       @alias_name = alias_name
       @client = client
+      @reindex_batch_size = reindex_batch_size
+      @reindex_requests_per_second = reindex_requests_per_second
       @migration_log_index = nil
       @current_step = nil
       @rollback_attempted = false
@@ -107,6 +109,9 @@ module SchemaTools
       @catchup2_index = "#{@new_index}-catchup-2"
       log "catchup2_index: #{@catchup2_index}"
+      @throwaway_test_index = "#{@new_index}-throwaway-test"
+      log "throwaway_test_index: #{@throwaway_test_index}}"
       # Use current index settings and mappings when creating catchup indexes
       # so that any reindex painless script logic will apply correctly to them.
       @current_settings = @client.get_index_settings(@current_index)
@@ -136,6 +141,10 @@ module SchemaTools
     def migration_steps
       [
+        MigrationStep.new(
+          name: "STEP 0: Pre-test reindex with 1 document",
+          run: ->(logger) { step0_test_reindex_one_doc }
+        ),
         MigrationStep.new(
           name: "STEP 1: Create catchup-1 index",
           run: ->(logger) { step1_create_catchup1 }
@@ -179,6 +188,19 @@ module SchemaTools
       ]
     end
+    def step0_test_reindex_one_doc
+      @client.create_index(@throwaway_test_index, @new_settings, @new_mappings)
+      begin
+        @client.reindex_one_doc(source_index: @current_index, dest_index: @throwaway_test_index, script: @reindex_script)
+      rescue => e
+        log "Failed reindexing a test document"
+        raise e
+      ensure
+        log "Deleting throwaway test index #{@throwaway_test_index}"
+        @client.delete_index(@throwaway_test_index)
+      end
+    end
     def step1_create_catchup1
       @client.create_index(@catchup1_index, @current_settings, @current_mappings)
       log "Created catchup-1 index: #{@catchup1_index}"
@@ -226,24 +248,38 @@ module SchemaTools
     end
     def reindex(current_index, new_index, reindex_script)
-      response = @client.reindex(current_index, new_index, reindex_script)
-      log response
-      if response['took']
-        log "Reindex task complete. Took: #{response['took']}"
+      task_response = @client.reindex(source_index: current_index, dest_index: new_index, script: reindex_script, size: @reindex_batch_size, requests_per_second: @reindex_requests_per_second)
+      log task_response
+      if task_response['took']
+        log "Reindex task complete. Took: #{task_response['took']}"
+        if task_response['failures'] && !task_response['failures'].empty?
+          failure_reason = task_response['failures'].map { |f| f['cause']['reason'] }.join("; ")
+          raise "Reindex failed synchronously with internal errors. Failures: #{failure_reason}"
+        end
         return true
       end
-      task_id = response['task']
-      if !task_id
-        raise "No task ID from reindex. Reindex incomplete."
+      task_id = task_response['task']
+      unless task_id
+        raise "Reindex response did not contain 'task' ID or 'took' time. Reindex incomplete."
       end
       log "Reindex task started at #{Time.now}. task_id is #{task_id}. Fetch task status with GET #{@client.url}/_tasks/#{task_id}"
       timeout = 604800 # 1 week
-      @client.wait_for_task(response['task'], timeout)
-      log "Reindex complete"
+      completed_task_status = @client.wait_for_task(task_response['task'], timeout)
+      final_result = completed_task_status.fetch('response', {})
+      if final_result['failures'] && !final_result['failures'].empty?
+        failure_reason = final_result['failures'].map { |f| f['cause']['reason'] }.join("; ")
+        raise "Reindex FAILED during async processing. Failures: #{failure_reason}"
+      end
+      created = final_result.fetch('created', 0)
+      updated = final_result.fetch('updated', 0)
+      deleted = final_result.fetch('deleted', 0)
+      log "Reindex complete." + \
+        "\nTook: #{final_result['took']}ms." + \
+        "\nCreated: #{created}" + \
+        "\nUpdated: #{updated}" + \
+        "\nDeleted: #{deleted}"
+      return true
     end
     def step4_create_catchup2

data/lib/schema_tools/migrate/rollback.rb CHANGED Viewed

@@ -105,7 +105,7 @@ module SchemaTools
           @logger.log "📊 Found #{doc_count} documents in catchup-1 index - reindexing to original..."
           # Reindex from catchup-1 to original index
-          response = @client.reindex(@catchup1_index, @current_index, nil)
+          response = @client.reindex(source_index: @catchup1_index, dest_index: @current_index, script: nil)
           @logger.log "Reindex task started - task_id: #{response['task']}"
           # Wait for reindex to complete

data/lib/schema_tools/new_alias.rb CHANGED Viewed

@@ -26,6 +26,9 @@ module SchemaTools
     sample_settings = {
       "number_of_shards" => 1,
       "number_of_replicas" => 0,
+      "replication": {
+        "type": "DOCUMENT"
+      },
       "analysis" => {
         "analyzer" => {
           "default" => {
@@ -59,13 +62,33 @@ module SchemaTools
     settings_file = File.join(schema_path, 'settings.json')
     mappings_file = File.join(schema_path, 'mappings.json')
+    reindex_file = File.join(schema_path, 'reindex.painless')
     File.write(settings_file, JSON.pretty_generate(sample_settings))
     File.write(mappings_file, JSON.pretty_generate(sample_mappings))
+    # Create example reindex.painless file
+    reindex_content = <<~PAINLESS
+      // Example reindex script for transforming data during migration
+      // Modify this script to transform your data as needed
+      //
+      // Example: Rename a field
+      // if (ctx._source.containsKey('old_field_name')) {
+      //   ctx._source.new_field_name = ctx._source.old_field_name;
+      //   ctx._source.remove('old_field_name');
+      // }
+      //
+      // Example: Add a new field
+      // ctx._source.new_field = 'default_value';
+      long timestamp = System.currentTimeMillis();
+    PAINLESS
+    File.write(reindex_file, reindex_content)
     puts "✓ Sample schema created at #{schema_path}"
     puts "  - settings.json"
     puts "  - mappings.json"
+    puts "  - reindex.painless"
   end
   def self.create_alias_for_index(client:)
@@ -142,13 +165,32 @@ module SchemaTools
     settings_file = File.join(schema_path, 'settings.json')
     mappings_file = File.join(schema_path, 'mappings.json')
+    reindex_file = File.join(schema_path, 'reindex.painless')
     File.write(settings_file, JSON.pretty_generate(filtered_settings))
     File.write(mappings_file, JSON.pretty_generate(mappings))
+    # Create example reindex.painless file
+    reindex_content = <<~PAINLESS
+      # Example reindex script for transforming data during migration
+      # Modify this script to transform your data as needed
+      #
+      # Example: Rename a field
+      # if (ctx._source.containsKey('old_field_name')) {
+      #   ctx._source.new_field_name = ctx._source.old_field_name;
+      #   ctx._source.remove('old_field_name');
+      # }
+      #
+      # Example: Add a new field
+      # ctx._source.new_field = 'default_value';
+    PAINLESS
+    File.write(reindex_file, reindex_content)
     puts "✓ Schema downloaded to #{schema_path}"
     puts "  - settings.json"
     puts "  - mappings.json"
+    puts "  - reindex.painless"
   end
 end

data/lib/tasks/schema.rake CHANGED Viewed

@@ -32,6 +32,7 @@ def create_client!
   client = SchemaTools::Client.new(
     SchemaTools::Config.connection_url,
     dryrun: ENV['DRYRUN'] == 'true',
+    interactive: ENV['INTERACTIVE'] == 'true',
     username: SchemaTools::Config.connection_username,
     password: SchemaTools::Config.connection_password
   )
@@ -51,11 +52,14 @@ namespace :schema do
   desc "Migrate to a specific alias schema or migrate all schemas to their latest revisions"
   task :migrate, [:alias_name] do |t, args|
     client = create_client!
+    reindex_batch_size = ENV['REINDEX_BATCH_SIZE'] ? ENV['REINDEX_BATCH_SIZE'].to_i : 1000
+    reindex_requests_per_second = ENV['REINDEX_REQUESTS_PER_SECOND'] ? ENV['REINDEX_REQUESTS_PER_SECOND'].to_i : -1
     if args[:alias_name]
-      SchemaTools.migrate_one_schema(alias_name: args[:alias_name], client: client)
+      SchemaTools.migrate_one_schema(alias_name: args[:alias_name], client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
     else
-      SchemaTools.migrate_all(client: client)
+      SchemaTools.migrate_all(client: client, reindex_batch_size: reindex_batch_size, reindex_requests_per_second: reindex_requests_per_second)
     end
   end
@@ -91,6 +95,30 @@ namespace :schema do
     )
   end
+  desc "Delete an alias (does not delete the index)"
+  task :drop, [:alias_name] do |t, args|
+    client = create_client!
+    unless args[:alias_name]
+      puts "Error: alias_name is required"
+      puts "Usage: rake 'schema:drop[alias_name]'"
+      exit 1
+    end
+    alias_name = args[:alias_name]
+    unless client.alias_exists?(alias_name)
+      puts "Error: Alias '#{alias_name}' does not exist"
+      exit 1
+    end
+    indices = client.get_alias_indices(alias_name)
+    puts "Deleting alias '#{alias_name}' from indices: #{indices.join(', ')}"
+    client.delete_alias(alias_name)
+    puts "✓ Alias '#{alias_name}' deleted successfully"
+  end
   desc "Download schema from an existing alias or index"
   task :download do |t, args|
     client = create_client!

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: schema-tools
 version: !ruby/object:Gem::Version
-  version: 1.0.5
+  version: 1.0.6
 platform: ruby
 authors:
 - Rich Kuzsma
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2025-10-11 00:00:00.000000000 Z
+date: 2025-10-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -138,7 +138,6 @@ files:
 - bin/setup
 - lib/schema_tools.rb
 - lib/schema_tools/api_aware_mappings_diff.rb
-- lib/schema_tools/catchup.rb
 - lib/schema_tools/client.rb
 - lib/schema_tools/close.rb
 - lib/schema_tools/config.rb

data/lib/schema_tools/catchup.rb DELETED Viewed

@@ -1,23 +0,0 @@
-module SchemaTools
-  def self.catchup(index_name:, client:)
-    raise "index_name parameter is required" unless index_name
-    index_config = SchemaFiles.get_index_config(index_name)
-    raise "Index configuration not found for #{index_name}" unless index_config
-    from_index = index_config['from_index_name']
-    raise "from_index_name not specified in index configuration" unless from_index
-    unless client.index_exists?(from_index)
-      raise "Source index #{from_index} does not exist. Cannot perform catchup reindex to #{index_name}."
-    end
-    reindex_script = SchemaFiles.get_reindex_script(index_name)
-    puts "Starting catchup reindex from #{from_index} to #{index_name}"
-    # TODO NOT IMPLEMENTED YET
-    # Do a reindex by query
-    puts "TODO IMPLEMENT ME"
-    response = client.reindex(from_index, index_name, reindex_script)
-  end
-end