schema-tools 1.0.6 → 1.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3a977e1d5ae35640094087dff8eb1b1e5481db2b6bc0d35315f219fea02210d6
4
- data.tar.gz: d8a40fbc727ede8614d607a22ca9aa830b32254598f8900d8f0e0b57fc315589
3
+ metadata.gz: 844c9578ff5adeafdd8d34da1df7ec2df9114a9db572680b32fb856a92c58b9d
4
+ data.tar.gz: cf21d35fc65c62922911bc18e50be33515e19dbd599ca9714d6c47d10b5d7199
5
5
  SHA512:
6
- metadata.gz: 07d90a197672dff508c6db4e6e40d90da18ce06dad381b0f46c6687d3e1d38eb2f36a305d06717863da6f5f60fcb4cdae25d9df4a5862a54338887a819255924
7
- data.tar.gz: cd758d32d70d00af78c8f9580ff66a02f7c4bc86de7dde7306f90eaa7f4ae88dd431987adbfd984bd7e8f12b5d9f648c66c89729443b47a87204f61b4fc65dd2
6
+ metadata.gz: 1743b2f92814bea0e6f368cc0b807b6dc4734a615ce2c752aca37dc50e9523db206b51882e596a4489df33bfb5178bc5a39729f7b688fa22a62dabe3548058cc
7
+ data.tar.gz: 0e5a6babe517b01f61773a4be3a679aabcd74f3acebf2200cfa08dbc4c9b4d9ed23f30f3e38ce29d5c5cd02847eea06dfd8d231989e96b220154c5b3722f9763
data/README.md CHANGED
@@ -7,6 +7,8 @@
7
7
  - Create new aliases with sample schemas.
8
8
  - Manage painless scripts independently from schema migrations.
9
9
 
10
+ A sample app that uses schema-tools is available at: https://github.com/richkuz/schema-tools-sample-app
11
+
10
12
  ## Quick start
11
13
 
12
14
  Install this Ruby gem.
@@ -41,23 +43,6 @@ export OPENSEARCH_USERNAME=your_username
41
43
  export OPENSEARCH_PASSWORD=your_password
42
44
  ```
43
45
 
44
- ### View available rake tasks
45
-
46
- ```sh
47
- rake -T | grep " schema:"
48
- ```
49
-
50
- Available schema tasks:
51
- - `schema:migrate[alias_name]` - Migrate to a specific alias schema or migrate all schemas
52
- - `schema:new` - Create a new alias with sample schema
53
- - `schema:close[name]` - Close an index or alias
54
- - `schema:delete[name]` - Hard delete an index (only works on closed indexes) or delete an alias
55
- - `schema:drop[alias_name]` - Delete an alias (does not delete the index)
56
- - `schema:download` - Download schema from an existing alias or index
57
- - `schema:alias` - Create an alias for an existing index
58
- - `schema:seed` - Seed data to a live index
59
- - `schema:diff` - Compare all schemas to their corresponding downloaded alias settings and mappings
60
-
61
46
  ### Download an existing schema
62
47
 
63
48
  Run `rake schema:download` to download a schema from an existing alias or index:
@@ -115,6 +100,24 @@ $ rake schema:new
115
100
  # - mappings.json
116
101
  ```
117
102
 
103
+ ### View available rake tasks
104
+
105
+ ```sh
106
+ rake -T | grep " schema:"
107
+ ```
108
+
109
+ Available schema tasks:
110
+ - `schema:download` - Download schema from an existing alias or index
111
+ - `schema:migrate` - Migrate all schemas to match local schema files
112
+ - `schema:migrate[alias_name]` - Migrate a specific alias to match its local schema files
113
+ - `schema:new` - Create a new alias with sample schema
114
+ - `schema:alias` - Create an alias for an existing index
115
+ - `schema:diff` - Compare all schemas to their corresponding downloaded alias settings and mappings
116
+ - `schema:seed` - Seed data to a live index
117
+ - `schema:close[name]` - Close an index or alias
118
+ - `schema:delete[name]` - Hard delete an index (only works on closed indexes) or delete an alias
119
+ - `schema:drop[alias_name]` - Delete an alias (does not delete the index)
120
+
118
121
  ## Directory structure reference
119
122
 
120
123
  Example directory structure with multiple aliases:
@@ -298,6 +301,8 @@ Use case:
298
301
 
299
302
  Rake `schema:migrate` solves this use case through the following procedure.
300
303
 
304
+ See: [Migration Procedure Diagram](https://github.com/richkuz/schema-tools/blob/main/docs/schema-tools-migration.svg)
305
+
301
306
  First, some terms:
302
307
  - `alias_name`: Alias containing the index to migrate
303
308
  - `products`
@@ -317,20 +322,20 @@ SETUP
317
322
  Create `log_index` to log the migration state.
318
323
  - The migration logs when it starts and completes a step along with a description.
319
324
 
320
- STEP 1
325
+ STEP 0
321
326
 
322
327
  Attempt to reindex 1 document to a throwaway index to catch obvious configuration errors and abort early if possible.
323
328
 
324
- STEP 2
329
+ STEP 1
325
330
 
326
331
  Create `catchup1_index` using the new schema.
327
332
  - This index will preserve writes during the reindex.
328
333
 
329
- STEP 3
334
+ STEP 2
330
335
 
331
336
  Configure `alias_name` to only write to `catchup1_index` and read from `current_index` and `catchup1_index`.
332
337
 
333
- STEP 4
338
+ STEP 3
334
339
 
335
340
  Create `new_index` using the new schema.
336
341
 
@@ -346,38 +351,38 @@ POST _reindex
346
351
  }
347
352
  ```
348
353
 
349
- STEP 5
354
+ STEP 4
350
355
 
351
356
  Create `catchup2_index` using the new schema.
352
357
  - This index ensures a place for ongoing writes while flushing `catchup1_index`.
353
358
 
354
- STEP 6
359
+ STEP 5
355
360
 
356
361
  Configure `alias_name` to only write to `catchup2_index` and continue reading from `current_index` and `catchup1_index`.
357
362
 
358
- STEP 7
363
+ STEP 6
359
364
 
360
365
  Reindex `catchup1_index` into `new_index`.
361
366
  - Merge the first catchup index into the new canonical index.
362
367
 
363
- STEP 8
368
+ STEP 7
364
369
 
365
370
  Configure `alias_name` so there are NO write indexes
366
371
  - This guarantees that no writes can sneak into an obsolete catchup index during the second (quick) merge.
367
372
  - Any write operations will fail during this time with: `"reason": "Alias [FOO] has more than one index associated with it [...], can't execute a single index op"`
368
373
  - Clients must retry any failed writes.
369
374
 
370
- STEP 9
375
+ STEP 8
371
376
 
372
377
  Reindex `catchup2_index` into `new_index`
373
378
  - Final sync to merge the second catchup index into the new canonical index.
374
379
 
375
- STEP 10
380
+ STEP 9
376
381
 
377
382
  Configure `alias_name` to write to and read from `new_index` only.
378
383
  - Writes resume to the single new index. All data and deletes are consistent.
379
384
 
380
- STEP 11
385
+ STEP 10
381
386
 
382
387
  Close unused indexes to avoid accidental writes.
383
388
  - Close `catchup1_index`
@@ -196,6 +196,11 @@ module SchemaTools
196
196
 
197
197
  response = post(url, body)
198
198
 
199
+ # In dry run mode, post returns a task response, not a completed response
200
+ if @dryrun
201
+ return response
202
+ end
203
+
199
204
  if response['failures'] && !response['failures'].empty?
200
205
  failure_reason = response['failures'].map { |f| f['cause']['reason'] }.join("; ")
201
206
  raise "Reindex failed with internal errors. Failures: #{failure_reason}"
@@ -205,14 +210,25 @@ module SchemaTools
205
210
  created = response['created'].to_i
206
211
  updated = response['updated'].to_i
207
212
 
208
- if total != 1
213
+ if total == 0
214
+ # Verify that the source index actually has 0 documents
215
+ source_doc_count = get_index_doc_count(source_index)
216
+ if source_doc_count == 0
217
+ # Handle the case where the source index has no documents
218
+ # This is a valid scenario - just log and continue
219
+ @logger.info("Source index has no documents to reindex")
220
+ else
221
+ # This shouldn't happen - if source has docs but reindex found 0, something went wrong
222
+ raise "Reindex query found 0 documents but source index has #{source_doc_count} documents. This indicates a reindex configuration issue."
223
+ end
224
+ elsif total != 1
209
225
  raise "Reindex query found #{total} documents. Expected to find 1."
210
- elsif created + updated != 1
226
+ elsif total == 1 && created + updated != 1
211
227
  raise "Reindex failed to index the document (created: #{created}, updated: #{updated}). Noops: #{response.fetch('noops', 0)}."
212
228
  elsif response['timed_out'] == true
213
229
  raise "Reindex operation timed out."
214
230
  end
215
-
231
+
216
232
  response
217
233
  end
218
234
 
@@ -94,7 +94,11 @@ module SchemaTools
94
94
  normalized_local_settings = self.normalize_local_settings(local_settings)
95
95
  normalized_remote_settings = self.normalize_remote_settings(filtered_remote_settings)
96
96
 
97
- result[:settings_diff] = json_diff.generate_diff(normalized_remote_settings, normalized_local_settings)
97
+ # Check for replica count differences first
98
+ replica_warning = check_replica_count_difference(normalized_remote_settings, normalized_local_settings)
99
+
100
+ # Generate diff while ignoring number_of_replicas
101
+ result[:settings_diff] = json_diff.generate_diff(normalized_remote_settings, normalized_local_settings, ignored_keys: ["index.number_of_replicas"])
98
102
  result[:mappings_diff] = json_diff.generate_diff(remote_mappings, local_mappings)
99
103
 
100
104
  result[:comparison_context] = {
@@ -108,8 +112,10 @@ module SchemaTools
108
112
  }
109
113
  }
110
114
 
115
+ # Set status based on diff results
111
116
  if result[:settings_diff] == "No changes detected" && result[:mappings_diff] == "No changes detected"
112
117
  result[:status] = :no_changes
118
+ result[:replica_warning] = replica_warning if replica_warning
113
119
  else
114
120
  result[:status] = :changes_detected
115
121
  end
@@ -163,10 +169,38 @@ module SchemaTools
163
169
 
164
170
  puts "Mappings Comparison:"
165
171
  puts schema_diff[:mappings_diff]
172
+
173
+ # Display replica warning if present
174
+ if schema_diff[:replica_warning]
175
+ puts
176
+ puts "⚠️ #{schema_diff[:replica_warning]}"
177
+ end
166
178
  end
167
179
 
168
180
  private
169
181
 
182
+ def self.check_replica_count_difference(remote_settings, local_settings)
183
+ remote_replicas = get_replica_count(remote_settings)
184
+ local_replicas = get_replica_count(local_settings)
185
+
186
+ if remote_replicas && local_replicas && remote_replicas != local_replicas
187
+ "WARNING: The specified number of replicas #{local_replicas} in the schema could not be applied to the cluster, likely because the cluster isn't running enough nodes."
188
+ else
189
+ nil
190
+ end
191
+ end
192
+
193
+ def self.get_replica_count(settings)
194
+ return nil unless settings.is_a?(Hash)
195
+
196
+ if settings.key?("index") && settings["index"].is_a?(Hash)
197
+ settings["index"]["number_of_replicas"]
198
+ else
199
+ settings["number_of_replicas"]
200
+ end
201
+ end
202
+
203
+
170
204
  def self.normalize_local_settings(local_settings)
171
205
  return local_settings unless local_settings.is_a?(Hash)
172
206
 
@@ -10,13 +10,13 @@ module SchemaTools
10
10
 
11
11
  # Generate a detailed diff between two JSON objects
12
12
  # Returns a formatted string showing additions, removals, and modifications
13
- def generate_diff(old_json, new_json, context: {})
13
+ def generate_diff(old_json, new_json, context: {}, ignored_keys: [])
14
14
  old_normalized = normalize_json(old_json)
15
15
  new_normalized = normalize_json(new_json)
16
16
 
17
17
  # Filter out ignored keys
18
- old_filtered = filter_ignored_keys(old_normalized)
19
- new_filtered = filter_ignored_keys(new_normalized)
18
+ old_filtered = filter_ignored_keys(old_normalized, "", ignored_keys)
19
+ new_filtered = filter_ignored_keys(new_normalized, "", ignored_keys)
20
20
 
21
21
  if old_filtered == new_filtered
22
22
  return "No changes detected"
@@ -27,7 +27,7 @@ module SchemaTools
27
27
  diff_lines << ""
28
28
 
29
29
  # Generate detailed diff
30
- changes = compare_objects(old_filtered, new_filtered, "")
30
+ changes = compare_objects(old_filtered, new_filtered, "", ignored_keys)
31
31
 
32
32
  if changes.empty?
33
33
  diff_lines << "No changes detected"
@@ -82,19 +82,22 @@ module SchemaTools
82
82
  normalized
83
83
  end
84
84
 
85
- def filter_ignored_keys(obj, path_prefix = "")
85
+ def filter_ignored_keys(obj, path_prefix = "", ignored_keys = [])
86
86
  return obj unless obj.is_a?(Hash)
87
87
 
88
+ # Combine class-level ignored keys with parameter ignored keys
89
+ all_ignored_keys = IGNORED_KEYS + ignored_keys
90
+
88
91
  filtered = {}
89
92
  obj.each do |key, value|
90
93
  current_path = path_prefix.empty? ? key : "#{path_prefix}.#{key}"
91
94
 
92
95
  # Skip ignored keys
93
- next if IGNORED_KEYS.any? { |ignored_key| current_path == ignored_key }
96
+ next if all_ignored_keys.any? { |ignored_key| current_path == ignored_key }
94
97
 
95
98
  # Recursively filter nested objects
96
99
  if value.is_a?(Hash)
97
- filtered[key] = filter_ignored_keys(value, current_path)
100
+ filtered[key] = filter_ignored_keys(value, current_path, ignored_keys)
98
101
  else
99
102
  filtered[key] = value
100
103
  end
@@ -103,14 +106,14 @@ module SchemaTools
103
106
  filtered
104
107
  end
105
108
 
106
- def compare_objects(old_obj, new_obj, path_prefix)
109
+ def compare_objects(old_obj, new_obj, path_prefix, ignored_keys = [])
107
110
  changes = []
108
111
 
109
112
  # Handle different object types
110
113
  if old_obj.is_a?(Hash) && new_obj.is_a?(Hash)
111
- changes.concat(compare_hashes(old_obj, new_obj, path_prefix))
114
+ changes.concat(compare_hashes(old_obj, new_obj, path_prefix, ignored_keys))
112
115
  elsif old_obj.is_a?(Array) && new_obj.is_a?(Array)
113
- changes.concat(compare_arrays(old_obj, new_obj, path_prefix))
116
+ changes.concat(compare_arrays(old_obj, new_obj, path_prefix, ignored_keys))
114
117
  elsif old_obj != new_obj
115
118
  changes << format_change(path_prefix, old_obj, new_obj)
116
119
  end
@@ -118,15 +121,18 @@ module SchemaTools
118
121
  changes
119
122
  end
120
123
 
121
- def compare_hashes(old_hash, new_hash, path_prefix)
124
+ def compare_hashes(old_hash, new_hash, path_prefix, ignored_keys = [])
122
125
  changes = []
123
126
  all_keys = (old_hash.keys + new_hash.keys).uniq.sort
124
127
 
128
+ # Combine class-level ignored keys with parameter ignored keys
129
+ all_ignored_keys = IGNORED_KEYS + ignored_keys
130
+
125
131
  all_keys.each do |key|
126
132
  current_path = path_prefix.empty? ? key : "#{path_prefix}.#{key}"
127
133
 
128
134
  # Skip ignored keys
129
- next if IGNORED_KEYS.any? { |ignored_key| current_path == ignored_key }
135
+ next if all_ignored_keys.any? { |ignored_key| current_path == ignored_key }
130
136
 
131
137
  old_value = old_hash[key]
132
138
  new_value = new_hash[key]
@@ -140,7 +146,7 @@ module SchemaTools
140
146
  elsif old_value != new_value
141
147
  if old_value.is_a?(Hash) && new_value.is_a?(Hash) ||
142
148
  old_value.is_a?(Array) && new_value.is_a?(Array)
143
- changes.concat(compare_objects(old_value, new_value, current_path))
149
+ changes.concat(compare_objects(old_value, new_value, current_path, ignored_keys))
144
150
  else
145
151
  changes << "🔄 MODIFIED: #{current_path}"
146
152
  changes.concat(format_value_details("Old value", old_value, " "))
@@ -152,7 +158,7 @@ module SchemaTools
152
158
  changes
153
159
  end
154
160
 
155
- def compare_arrays(old_array, new_array, path_prefix)
161
+ def compare_arrays(old_array, new_array, path_prefix, ignored_keys = [])
156
162
  changes = []
157
163
 
158
164
  if old_array.length != new_array.length
@@ -169,7 +175,7 @@ module SchemaTools
169
175
  if old_value != new_value
170
176
  if old_value.is_a?(Hash) && new_value.is_a?(Hash) ||
171
177
  old_value.is_a?(Array) && new_value.is_a?(Array)
172
- changes.concat(compare_objects(old_value, new_value, current_path))
178
+ changes.concat(compare_objects(old_value, new_value, current_path, ignored_keys))
173
179
  else
174
180
  changes << "🔄 MODIFIED: #{current_path}"
175
181
  changes.concat(format_value_details("Old value", old_value, " "))
@@ -7,6 +7,9 @@ module SchemaTools
7
7
 
8
8
  if diff_result[:status] == :no_changes
9
9
  puts "✓ Migration verification successful - no differences detected"
10
+ if diff_result[:replica_warning]
11
+ puts "⚠️ #{diff_result[:replica_warning]}"
12
+ end
10
13
  puts "Migration completed successfully!"
11
14
  else
12
15
  puts "⚠️ Migration verification failed - differences detected:"
@@ -5,6 +5,21 @@ require_relative 'config'
5
5
  require_relative 'settings_filter'
6
6
 
7
7
  module SchemaTools
8
+ EXAMPLE_PAINLESS_SCRIPT = <<~PAINLESS
9
+ // Example reindex script for transforming data during migration
10
+ // Modify this script to transform your data as needed
11
+ //
12
+ // Example: Rename a field
13
+ // if (ctx._source.containsKey('old_field_name')) {
14
+ // ctx._source.new_field_name = ctx._source.old_field_name;
15
+ // ctx._source.remove('old_field_name');
16
+ // }
17
+ //
18
+ // Example: Add a new field
19
+ // ctx._source.new_field = 'default_value';
20
+ long timestamp = System.currentTimeMillis();
21
+ PAINLESS
22
+
8
23
  def self.new_alias(client:)
9
24
  puts "\nEnter a new alias name:"
10
25
  alias_name = STDIN.gets&.chomp
@@ -68,20 +83,7 @@ module SchemaTools
68
83
  File.write(mappings_file, JSON.pretty_generate(sample_mappings))
69
84
 
70
85
  # Create example reindex.painless file
71
- reindex_content = <<~PAINLESS
72
- // Example reindex script for transforming data during migration
73
- // Modify this script to transform your data as needed
74
- //
75
- // Example: Rename a field
76
- // if (ctx._source.containsKey('old_field_name')) {
77
- // ctx._source.new_field_name = ctx._source.old_field_name;
78
- // ctx._source.remove('old_field_name');
79
- // }
80
- //
81
- // Example: Add a new field
82
- // ctx._source.new_field = 'default_value';
83
- long timestamp = System.currentTimeMillis();
84
- PAINLESS
86
+ reindex_content = EXAMPLE_PAINLESS_SCRIPT
85
87
 
86
88
  File.write(reindex_file, reindex_content)
87
89
 
@@ -170,20 +172,7 @@ module SchemaTools
170
172
  File.write(settings_file, JSON.pretty_generate(filtered_settings))
171
173
  File.write(mappings_file, JSON.pretty_generate(mappings))
172
174
 
173
- # Create example reindex.painless file
174
- reindex_content = <<~PAINLESS
175
- # Example reindex script for transforming data during migration
176
- # Modify this script to transform your data as needed
177
- #
178
- # Example: Rename a field
179
- # if (ctx._source.containsKey('old_field_name')) {
180
- # ctx._source.new_field_name = ctx._source.old_field_name;
181
- # ctx._source.remove('old_field_name');
182
- # }
183
- #
184
- # Example: Add a new field
185
- # ctx._source.new_field = 'default_value';
186
- PAINLESS
175
+ reindex_content = EXAMPLE_PAINLESS_SCRIPT
187
176
 
188
177
  File.write(reindex_file, reindex_content)
189
178
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: schema-tools
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.6
4
+ version: 1.0.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rich Kuzsma
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-10-13 00:00:00.000000000 Z
11
+ date: 2025-10-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake