exwiw 0.9.1 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 577420a290ea24adc64ef6ba21914ac71b1c98f86163f9d1d8a88643cc89e15b
4
- data.tar.gz: eaddb0073169543aa1bea944cb1b075fbd8258a034f4cffb9cf0011338ba260a
3
+ metadata.gz: 1856ad12991806b66ecce4a9fa6d85bc89c9e371f38c90da14168367b99e4f5b
4
+ data.tar.gz: 0fe953311f016847e4b4adf27c8b82ff94d3012d8d896406200b4d94080e442e
5
5
  SHA512:
6
- metadata.gz: c21735b0dcda7b30b0519944c37f24610564e91801a29e2c0f7cc3632cc0f24c532d10c078127bccffd0a2f258a5695cdabe25094edf228b5e3203a629ddf175
7
- data.tar.gz: 7e9ce9682b281c097485da409befc466dd06fc655c1612278b9e98f6bccdea575aa443ff4292777be1e4e68f88d826554702eecd6b94bddc31b1e4edc01c923b
6
+ metadata.gz: bb984601846145d7e542bdaddfecfb836a393926f3cd985f879bdabd3a5c28a37b190de443ae881faf2adec384429746dd1816b2d5c5ad0d11d4f570dbc94e80
7
+ data.tar.gz: 492316c915507618455644f2f4ff2073479db979a6e18b70aa8e2ff1efeb4df76562770be3bf760b3ab628c863ac3bb3b5a70806bc029730bda29df0bdb90ec2
data/CHANGELOG.md CHANGED
@@ -2,6 +2,12 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.9.2] - 2026-06-30
6
+
7
+ ### Added
8
+
9
+ - **The mongodb adapter accepts a server-enforced query timeout, globally and per collection.** A global `--mongodb-query-timeout-ms=N` (also `mongodb_query_timeout_ms:` in the config file) sets the MongoDB driver's CSOT `timeout_ms` on the client, so every operation — the find cursor's whole lifetime (initial batch and every `getMore` the streaming dump walks), the document count, and an executing `explain` — is bounded; past the deadline the server aborts the operation and exwiw fails the run, so an accidentally heavy or unscoped query cannot keep pinning the (often production) source. A per-collection `query_timeout_ms` key in the schema config overrides the global for that collection's find/count (give a known-large collection more headroom, or cap a runaway one); like `bulk_insert_chunk_size`, it is user-maintained and preserved across Mongoid schema regeneration. Both are **mongodb-only** (the SQL adapters shell out to their own clients) and must be positive integers; the default is no timeout. The fork-parallel dump inherits both automatically (workers rebuild the client from the same connection config and run the same per-collection queries).
10
+
5
11
  ## [0.9.1] - 2026-06-30
6
12
 
7
13
  ### Fixed
data/README.md CHANGED
@@ -300,6 +300,7 @@ insert_only: false
300
300
  after_insert_hook: hooks/seed.rb
301
301
  log_level: info # debug | info
302
302
  # target_table / ids / ids_field / scope_column may also be set here
303
+ # mongodb_query_timeout_ms: 30000 # global query timeout (mongodb only)
303
304
  ```
304
305
 
305
306
  With the file above, only the connection details need to be supplied on the CLI:
@@ -317,6 +318,7 @@ Notes:
317
318
  - Unknown keys are rejected so a typo surfaces immediately.
318
319
  - Export-only keys (`output_dir`, `output_format`, `insert_only`, `after_insert_hook`) are ignored when running `explain`, so a single config file can be shared by both subcommands.
319
320
  - `explain_verbosity` sets the mongodb `explain` verbosity (`queryPlanner` | `executionStats` | `allPlansExecution`, default `queryPlanner`); the `EXWIW_MONGODB_EXPLAIN_VERBOSITY` env var overrides it. Ignored by the SQL adapters and by `export`. See [`exwiw explain`](#mongodb-explain-verbosity).
321
+ - `mongodb_query_timeout_ms` sets the global, server-enforced query timeout (mongodb only); the `--mongodb-query-timeout-ms` CLI flag overrides it. Ignored by the SQL adapters. See [MongoDB notes](#mongodb-notes).
320
322
 
321
323
  ### Generator
322
324
 
@@ -395,7 +397,7 @@ It is a distinct task and class (`Exwiw::MongoidSchemaGenerator`) from the Activ
395
397
 
396
398
  Models in an inheritance hierarchy whose subclasses share the base's collection (Mongoid STI, distinguished by the auto-added `_type` discriminator) collapse into a single config: the generator discovers the subclasses via `descendants` (Mongoid registers only the base class in `Mongoid.models`) and unions every class's `fields` and `belongs_tos` into the collection config, so subclass-only fields and associations are not lost.
397
399
 
398
- Regeneration preserves hand-edited `replace_with`, `filter`, `ignore`, and `bulk_insert_chunk_size` values, like the ActiveRecord generator. Indexes are not written to the config — they are introspected from the live database at dump time (see [MongoDB notes](#mongodb-notes)). Polymorphic `belongs_to` is not yet expanded by this task.
400
+ Regeneration preserves hand-edited `replace_with`, `filter`, `ignore`, `bulk_insert_chunk_size`, and `query_timeout_ms` values, like the ActiveRecord generator. Indexes are not written to the config — they are introspected from the live database at dump time (see [MongoDB notes](#mongodb-notes)). Polymorphic `belongs_to` is not yet expanded by this task.
399
401
 
400
402
  By default the task **aborts** when a model uses a construct exwiw cannot represent: a `belongs_to` whose target class can no longer be resolved (a stale relation left behind after its model was removed), or a polymorphic / self-referential-cyclic / ambiguous / unresolvable-parent `embedded_in` (see the cases above).
401
403
 
@@ -800,6 +802,7 @@ The MongoDB adapter is experimental. To use it:
800
802
  - `--target-collection=COLLECTION` is a mongodb-only alias of `--target-table` (use whichever reads better for MongoDB). Specifying both, or using `--target-collection` with a non-mongodb adapter, is an error.
801
803
  - `--ids-field=FIELD` matches `--ids` against `FIELD` on the target collection instead of its primary key (e.g. `--target-collection=users --ids=a@example.com --ids-field=email`). Downstream foreign-key propagation still keys off the primary key, so only the target collection's filter changes. Unlike the primary-key path, the supplied ids are **not** type-coerced (the stored type of a custom field is unknown), so pass values matching the field's actual type. This flag is **mongodb-only** (the SQL adapters have no equivalent).
802
804
  - Large or embedded-document-heavy dumps are streamed automatically: the adapter reads the collection through a lazy cursor (not `.to_a`) and writes JSONL in chunks, so peak memory is bounded by the chunk size rather than the collection size — no flag to set. Encoding each document to MongoDB Extended JSON is accelerated by an **optional native (C) extension** that compiles automatically on `gem install`; where it cannot compile, exwiw falls back to a byte-identical pure-Ruby encoder. See [`docs/optimization-notes.md`](docs/optimization-notes.md) for the performance investigation and [`docs/optimize-mongodb-export-with-native-ext.md`](docs/optimize-mongodb-export-with-native-ext.md) for the native encoder's design. Benchmark your own data with `script/bench_mongodb_dump.rb`.
805
+ - `--mongodb-query-timeout-ms=N` sets a global, **server-enforced** timeout (in milliseconds) on every query exwiw issues — the find cursor's whole lifetime (the initial batch and every `getMore` the streaming dump walks), the count, and an executing `explain`. Past the deadline the server aborts the operation and exwiw fails the run, so an accidentally heavy or unscoped query cannot keep pinning the (often production) source. It is **mongodb-only** (the SQL adapters shell out to their own clients) and may also be set as `mongodb_query_timeout_ms:` in the config file. The default is no timeout. A single collection can opt to a different limit with a `query_timeout_ms` key in its schema config (a sibling of `bulk_insert_chunk_size`), which overrides the global for that collection's find/count — use it to give a known-large collection more headroom, or to cap one that tends to run away. Hand-edited `query_timeout_ms` values are preserved across schema regeneration.
803
806
  - `--parallel-workers=N` (opt-in, `export` only) forks `N` worker processes that decode whole collections in parallel — the dominant cost on a large dump is the driver's BSON→Ruby decode, and each worker decodes its own collections in their natural order, so the output stays **byte-identical** to a serial run (same filenames and content). It needs a dump target (the schedule is built around the scoped DAG) and a `fork`-capable runtime (CRuby on POSIX), falling back to the serial path otherwise; it also accepts `parallel_workers:` in the config file. The speedup needs real cores to spend — it reaches ~2× from 4 workers and saturates there. The default is serial. See [`docs/mongodb-dump-parallelism-2x-notes.md`](docs/mongodb-dump-parallelism-2x-notes.md) for the schedule and measurements.
804
807
  - Output is JSON Lines (`insert-{idx}-{collection}.jsonl`) using MongoDB Extended JSON (relaxed mode). Import with `mongoimport`:
805
808
  ```bash
@@ -40,15 +40,20 @@ module Exwiw
40
40
  class StreamingResult
41
41
  include Enumerable
42
42
 
43
- def initialize(view:, collection:, keys:, state:)
43
+ def initialize(view:, collection:, keys:, state:, timeout_ms: nil)
44
44
  @view = view
45
45
  @collection = collection
46
46
  @keys = keys
47
47
  @state = state
48
+ @timeout_ms = timeout_ms
48
49
  end
49
50
 
50
51
  def size
51
- @size ||= @view.count_documents
52
+ # count_documents reads :timeout_ms only from the opts passed here (it
53
+ # does not inherit the find view's per-op timeout), so the per-collection
54
+ # value must be threaded in explicitly. When nil it falls back to the
55
+ # client-wide global timeout, like every other operation.
56
+ @size ||= @timeout_ms ? @view.count_documents(timeout_ms: @timeout_ms) : @view.count_documents
52
57
  end
53
58
  alias length size
54
59
 
@@ -172,6 +177,7 @@ module Exwiw
172
177
  primary_key: config.primary_key,
173
178
  filter: filter,
174
179
  projection: build_projection(config, @propagation_keys),
180
+ timeout_ms: config.query_timeout_ms,
175
181
  )
176
182
  end
177
183
 
@@ -179,7 +185,7 @@ module Exwiw
179
185
  @logger.debug(" Executing Mongo find on '#{query.collection}': filter=#{query.filter.inspect} projection=#{query.projection.inspect}")
180
186
 
181
187
  view = db[query.collection]
182
- .find(query.filter)
188
+ .find(query.filter, find_timeout_opts(query))
183
189
  .projection(query.projection)
184
190
  .comment(query_comment_text("collection=#{query.collection}"))
185
191
 
@@ -195,7 +201,7 @@ module Exwiw
195
201
  # large / embed-heavy collections — the dump's dominant memory cost. The
196
202
  # propagation-key values are captured as the cursor streams and published
197
203
  # into @state once the pass completes (see StreamingResult).
198
- StreamingResult.new(view: view, collection: query.collection, keys: keys, state: @state)
204
+ StreamingResult.new(view: view, collection: query.collection, keys: keys, state: @state, timeout_ms: query.timeout_ms)
199
205
  end
200
206
 
201
207
  # NOTE: relies on @embedded_children_by_parent set by a prior build_query
@@ -233,7 +239,7 @@ module Exwiw
233
239
  @logger.debug(" Running explain (verbosity=#{verbosity}) on '#{query.collection}': filter=#{query.filter.inspect}")
234
240
 
235
241
  result = db[query.collection]
236
- .find(query.filter)
242
+ .find(query.filter, find_timeout_opts(query))
237
243
  .projection(query.projection)
238
244
  .comment(query_comment_text("collection=#{query.collection}"))
239
245
  .explain(verbosity: verbosity)
@@ -360,6 +366,16 @@ module Exwiw
360
366
  false
361
367
  end
362
368
 
369
+ # Per-operation find options carrying the collection's CSOT timeout. An
370
+ # empty hash when the query has none, so the operation inherits the
371
+ # client-wide global timeout (or runs untimed if that is also unset). The
372
+ # find view's :timeout_ms governs the whole cursor lifetime — initial batch
373
+ # plus every getMore the streaming dump walks — which is what makes it the
374
+ # right cap for an accidentally heavy/unscoped extraction.
375
+ private def find_timeout_opts(query)
376
+ query.timeout_ms ? { timeout_ms: query.timeout_ms } : {}
377
+ end
378
+
363
379
  private def reject_filter!(config)
364
380
  return if config.filter.nil? || config.filter.to_s.empty?
365
381
 
@@ -670,14 +686,14 @@ module Exwiw
670
686
  # given, overrides the database in the URI path; otherwise the
671
687
  # URI's own database is used. The URI is never logged (it may carry
672
688
  # credentials).
673
- client_options = {}
689
+ client_options = global_timeout_options
674
690
  if @connection_config.database_name && !@connection_config.database_name.to_s.empty?
675
691
  client_options[:database] = @connection_config.database_name
676
692
  end
677
693
  Mongo::Client.new(@connection_config.uri, **client_options)
678
694
  else
679
695
  address = "#{@connection_config.host}:#{@connection_config.port}"
680
- options = { database: @connection_config.database_name }
696
+ options = global_timeout_options.merge(database: @connection_config.database_name)
681
697
  if @connection_config.user && !@connection_config.user.to_s.empty?
682
698
  options[:user] = @connection_config.user
683
699
  options[:password] = @connection_config.password
@@ -690,6 +706,15 @@ module Exwiw
690
706
  private def uri_connection?
691
707
  !@connection_config.uri.nil? && !@connection_config.uri.to_s.empty?
692
708
  end
709
+
710
+ # Client-level CSOT default applied to every operation on this connection
711
+ # (find cursor lifetime, count, executing explain). nil when no global
712
+ # timeout is configured, leaving the client untimed; a per-collection
713
+ # `query_timeout_ms` still overrides this per find/count.
714
+ private def global_timeout_options
715
+ timeout = @connection_config.mongodb_query_timeout_ms
716
+ timeout ? { timeout_ms: timeout } : {}
717
+ end
693
718
  end
694
719
  end
695
720
  end
data/lib/exwiw/cli.rb CHANGED
@@ -37,6 +37,7 @@ module Exwiw
37
37
  scope_column
38
38
  parallel_workers
39
39
  explain_verbosity
40
+ mongodb_query_timeout_ms
40
41
  ].freeze
41
42
 
42
43
  # MongoDB explain verbosity levels (passed through to the server's explain
@@ -89,6 +90,7 @@ module Exwiw
89
90
  @insert_only = nil
90
91
  @after_insert_hook_path = nil
91
92
  @parallel_workers = nil
93
+ @mongodb_query_timeout_ms = nil
92
94
  @explain_verbosity = nil
93
95
  # nil (not :info) so we can tell "user passed --log-level" from the default,
94
96
  # letting a config-file value fill in; the :info default is applied later.
@@ -113,6 +115,7 @@ module Exwiw
113
115
  password: @database_password,
114
116
  database_name: @database_name,
115
117
  uri: @connection_uri,
118
+ mongodb_query_timeout_ms: @mongodb_query_timeout_ms,
116
119
  )
117
120
 
118
121
  dump_target = DumpTarget.new(
@@ -178,6 +181,7 @@ module Exwiw
178
181
  resolve_ids_field!
179
182
  resolve_uri_option!
180
183
  resolve_parallel_workers!
184
+ resolve_mongodb_query_timeout_ms!
181
185
 
182
186
  if @subcommand == "explain"
183
187
  validate_explain_only!
@@ -331,6 +335,7 @@ module Exwiw
331
335
  @ids_field ||= config["ids_field"]
332
336
  @scope_column ||= config["scope_column"]
333
337
  @parallel_workers ||= parse_parallel_workers(config["parallel_workers"]) if config.key?("parallel_workers")
338
+ @mongodb_query_timeout_ms ||= parse_mongodb_query_timeout_ms(config["mongodb_query_timeout_ms"]) if config.key?("mongodb_query_timeout_ms")
334
339
  @explain_verbosity ||= config["explain_verbosity"]
335
340
  end
336
341
 
@@ -457,6 +462,38 @@ module Exwiw
457
462
  exit 1
458
463
  end
459
464
 
465
+ # `--mongodb-query-timeout-ms` sets the global, server-enforced CSOT timeout
466
+ # applied to every MongoDB query (find cursor lifetime, count, executing
467
+ # explain). It is mongodb-only (the SQL adapters shell out to their own
468
+ # clients) and must be a positive integer. A per-collection `query_timeout_ms`
469
+ # in the schema config overrides it. Runs after the adapter name is normalized
470
+ # so the family check is reliable.
471
+ private def resolve_mongodb_query_timeout_ms!
472
+ return if @mongodb_query_timeout_ms.nil?
473
+
474
+ if @database_adapter != "mongodb"
475
+ $stderr.puts "--mongodb-query-timeout-ms is only supported by the mongodb adapter"
476
+ exit 1
477
+ end
478
+
479
+ if @mongodb_query_timeout_ms < 1
480
+ $stderr.puts "--mongodb-query-timeout-ms must be a positive integer (got #{@mongodb_query_timeout_ms})"
481
+ exit 1
482
+ end
483
+ end
484
+
485
+ # Coerce a config-file `mongodb_query_timeout_ms` (YAML scalar) to Integer,
486
+ # matching the CLI flag's Integer coercion. A non-integer value is a config
487
+ # typo, so fail fast rather than silently dropping it.
488
+ private def parse_mongodb_query_timeout_ms(value)
489
+ return nil if value.nil?
490
+
491
+ Integer(value)
492
+ rescue ArgumentError, TypeError
493
+ $stderr.puts "config 'mongodb_query_timeout_ms' must be an integer (got #{value.inspect})"
494
+ exit 1
495
+ end
496
+
460
497
  private def validate_explain_only!
461
498
  rejected = []
462
499
  rejected << "--output-dir" unless @output_dir.nil?
@@ -591,6 +628,7 @@ module Exwiw
591
628
  @after_insert_hook_path = File.expand_path(v)
592
629
  end
593
630
  opts.on("--parallel-workers=N", Integer, "Fork N workers for the MongoDB dump's parallel schedule (mongodb + export only; N>=2 enables it, default is serial). Output is byte-identical to serial; falls back to serial where fork is unavailable.") { |v| @parallel_workers = v }
631
+ opts.on("--mongodb-query-timeout-ms=N", Integer, "Global server-enforced timeout (ms) for every MongoDB query (mongodb only). Aborts an accidentally heavy/unscoped query past the deadline. Overridden per collection by `query_timeout_ms` in the schema config.") { |v| @mongodb_query_timeout_ms = v }
594
632
  opts.on("--log-level=LEVEL", "Log level (debug, info). default is info") { |v| @log_level = v.to_sym }
595
633
 
596
634
  opts.on("--help", "Print this help") do
@@ -2,14 +2,19 @@
2
2
 
3
3
  module Exwiw
4
4
  module MongoQuery
5
- Find = Struct.new(:collection, :primary_key, :filter, :projection, keyword_init: true) do
5
+ # `timeout_ms` is the per-collection, server-enforced operation timeout (CSOT)
6
+ # applied when this query runs; nil falls back to the client-wide global
7
+ # (ConnectionConfig#mongodb_query_timeout_ms) or, if that is also unset, no
8
+ # timeout. Omitted from #to_h when nil so the historical 4-key query shape is
9
+ # unchanged for collections without a timeout.
10
+ Find = Struct.new(:collection, :primary_key, :filter, :projection, :timeout_ms, keyword_init: true) do
6
11
  def to_h
7
12
  {
8
13
  collection: collection,
9
14
  primary_key: primary_key,
10
15
  filter: filter,
11
16
  projection: projection,
12
- }
17
+ }.tap { |h| h[:timeout_ms] = timeout_ms unless timeout_ms.nil? }
13
18
  end
14
19
  end
15
20
  end
@@ -13,6 +13,13 @@ module Exwiw
13
13
  attribute :belongs_tos, array(BelongsTo)
14
14
  attribute :fields, array(MongodbField)
15
15
  attribute :bulk_insert_chunk_size, optional(Integer), skip_serializing_if_nil: true
16
+ # Per-collection server-enforced query timeout in milliseconds (CSOT). Applied
17
+ # to this collection's find (whole cursor lifetime), count, and an executing
18
+ # explain; overrides the global ConnectionConfig#mongodb_query_timeout_ms. Use
19
+ # it to give a known-large collection more headroom than the global default
20
+ # (or, conversely, to cap one that tends to run away). User-maintained — the
21
+ # generator never emits it, so #merge carries it forward across regeneration.
22
+ attribute :query_timeout_ms, optional(Integer), skip_serializing_if_nil: true
16
23
  attribute :ignore, Serdes::OptionalType.new(Serdes::ConcreteType.new(Boolean)), skip_serializing_if_nil: true
17
24
  # Free-form note. Purely informational — exwiw never reads it — and preserved
18
25
  # across `MongoidSchemaGenerator` regeneration like the field / belongs_to
@@ -62,7 +69,8 @@ module Exwiw
62
69
  # - structural facts come from the freshly generated config: primary_key,
63
70
  # belongs_tos, embedded_in.
64
71
  # - user customizations are kept from the receiver: filter, ignore,
65
- # bulk_insert_chunk_size, and each field's `replace_with` masking rule.
72
+ # bulk_insert_chunk_size, query_timeout_ms, and each field's `replace_with`
73
+ # masking rule.
66
74
  # - generated fields drive the field list (so added/removed fields track the
67
75
  # model), but a matching receiver field wins to retain its masking.
68
76
  def merge(passed)
@@ -73,6 +81,7 @@ module Exwiw
73
81
  merged.primary_key = passed.primary_key
74
82
  merged.filter = filter
75
83
  merged.bulk_insert_chunk_size = bulk_insert_chunk_size
84
+ merged.query_timeout_ms = query_timeout_ms
76
85
  merged.ignore = ignore
77
86
  merged.ignore_type = ignore_type
78
87
  # A freshly generated comment (e.g. the skip_unsupported marker) wins so
data/lib/exwiw/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Exwiw
4
- VERSION = "0.9.1"
4
+ VERSION = "0.9.2"
5
5
  end
data/lib/exwiw.rb CHANGED
@@ -56,5 +56,13 @@ module Exwiw
56
56
  # mongodb adapter, e.g. `mongodb+srv://...`). When present it is the source of
57
57
  # truth for the connection — host/port/user/password are ignored — so TLS,
58
58
  # replica_set, auth_source, etc. can be expressed via the URI's query string.
59
- ConnectionConfig = Struct.new(:adapter, :host, :port, :user, :password, :database_name, :uri, keyword_init: true)
59
+ #
60
+ # `mongodb_query_timeout_ms` is the global, server-enforced operation timeout
61
+ # (CSOT `timeout_ms`) applied to every MongoDB query exwiw issues — the find
62
+ # cursor's whole lifetime, the count, and an executing `explain`. It guards
63
+ # against an accidentally heavy/unscoped query pinning the (often production)
64
+ # source: the server aborts the operation past the deadline. nil leaves it
65
+ # unset (no timeout). A per-collection `query_timeout_ms` overrides it.
66
+ # mongodb adapter only.
67
+ ConnectionConfig = Struct.new(:adapter, :host, :port, :user, :password, :database_name, :uri, :mongodb_query_timeout_ms, keyword_init: true)
60
68
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: exwiw
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.1
4
+ version: 0.9.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shia