scout-gear 10.8.4 → 10.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/.vimproject +13 -0
  3. data/README.md +352 -0
  4. data/VERSION +1 -1
  5. data/doc/Association.md +288 -0
  6. data/doc/Entity.md +296 -0
  7. data/doc/KnowledgeBase.md +433 -0
  8. data/doc/Persist.md +356 -0
  9. data/doc/Semaphore.md +171 -0
  10. data/doc/TSV.md +449 -0
  11. data/doc/WorkQueue.md +359 -0
  12. data/doc/Workflow.md +586 -0
  13. data/lib/scout/association.rb +4 -2
  14. data/lib/scout/entity/identifiers.rb +1 -1
  15. data/lib/scout/entity/object.rb +1 -1
  16. data/lib/scout/entity/property.rb +5 -5
  17. data/lib/scout/entity.rb +1 -1
  18. data/lib/scout/knowledge_base/description.rb +1 -1
  19. data/lib/scout/knowledge_base/list.rb +7 -2
  20. data/lib/scout/knowledge_base/registry.rb +2 -2
  21. data/lib/scout/knowledge_base.rb +20 -2
  22. data/lib/scout/monitor.rb +10 -6
  23. data/lib/scout/persist/engine/packed_index.rb +2 -2
  24. data/lib/scout/persist/engine/sharder.rb +1 -1
  25. data/lib/scout/persist/tsv.rb +1 -0
  26. data/lib/scout/semaphore.rb +1 -1
  27. data/lib/scout/tsv/dumper.rb +3 -3
  28. data/lib/scout/tsv/open.rb +1 -0
  29. data/lib/scout/tsv/parser.rb +1 -1
  30. data/lib/scout/tsv/transformer.rb +1 -0
  31. data/lib/scout/tsv/util.rb +2 -2
  32. data/lib/scout/work_queue/socket.rb +1 -1
  33. data/lib/scout/work_queue/worker.rb +7 -5
  34. data/lib/scout/workflow/entity.rb +22 -1
  35. data/lib/scout/workflow/step/config.rb +3 -3
  36. data/lib/scout/workflow/step/file.rb +4 -0
  37. data/lib/scout/workflow/step/info.rb +8 -2
  38. data/lib/scout/workflow/step.rb +10 -5
  39. data/lib/scout/workflow/task/inputs.rb +1 -1
  40. data/lib/scout/workflow/usage.rb +3 -2
  41. data/lib/scout/workflow/util.rb +22 -0
  42. data/scout-gear.gemspec +16 -5
  43. data/scout_commands/cat +86 -0
  44. data/scout_commands/doc +3 -1
  45. data/scout_commands/entity +151 -0
  46. data/scout_commands/system/status +238 -0
  47. data/scout_commands/workflow/info +23 -10
  48. data/scout_commands/workflow/install +1 -1
  49. data/test/scout/entity/test_property.rb +1 -1
  50. data/test/scout/knowledge_base/test_registry.rb +19 -0
  51. data/test/scout/test_work_queue.rb +1 -1
  52. data/test/scout/work_queue/test_worker.rb +12 -10
  53. metadata +15 -4
  54. data/doc/lib/scout/path.md +0 -35
  55. data/doc/lib/scout/workflow/task.md +0 -13
data/doc/Persist.md ADDED
@@ -0,0 +1,356 @@
1
+ # Persist
2
+
3
+ Persist provides a unified, engine-agnostic persistence layer for serializing, saving and loading Ruby objects and TSVs to/from files, with:
4
+
5
+ - Typed serialization for common types and TSV shapes.
6
+ - High-level caching with locking and atomic writes (Persist.persist).
7
+ - TSV-aware persistence (Persist.tsv / Persist.persist_tsv) with pluggable storage engines.
8
+ - Storage engines:
9
+ - TokyoCabinet HDB/BDB (key-value stores) with TSVAdapter.
10
+ - Tkrzw (HashDBM) via ScoutTKRZW.
11
+ - FixWidthTable (position/range index with fixed-size records).
12
+ - PackedIndex (binary-packed fixed-size rows).
13
+ - Sharder (key-space sharding over sub-databases).
14
+ - Concurrency-safe read/write locks and metadata persistence.
15
+
16
+ Sections:
17
+ - Core API (serialize, save/load, persist)
18
+ - TSV persistence helpers
19
+ - Storage engines
20
+ - TokyoCabinet (HDB/BDB)
21
+ - Tkrzw (HashDBM)
22
+ - FixWidthTable (FWT) — point/range indices
23
+ - PackedIndex (PKI) — fixed-layout random access
24
+ - Sharder — key-based horizontal sharding
25
+ - TSVAdapter (persistence adapters and serializers)
26
+ - Examples
27
+
28
+ ---
29
+
30
+ ## Core API
31
+
32
+ Serialization helpers
33
+ - Persist.serialize(content, type) / Persist.deserialize(serialized, type)
34
+ - Types include string/text/file/path/binary, integer/float/boolean, array, json/yaml/marshal, and “_array” variants.
35
+ - IO/StringIO are read into strings when needed; :binary writes in 'wb'.
36
+
37
+ Save/load
38
+ - Persist.save(content, file, type = :serializer)
39
+ - Uses registered save_drivers (see below); otherwise serializes and writes atomically (Open.sensible_write).
40
+ - Supports special type :memory (in-RAM).
41
+ - Persist.load(file, type = :serializer)
42
+ - Uses registered load_drivers; returns typed Ruby object (or persistence-backed TSV for TSV types).
43
+
44
+ High-level caching
45
+ - Persist.persist(name, type = :serializer, options = {}) { |maybe_file| value }
46
+ - Computes and stores a value if missing/outdated, then returns it.
47
+ - Locks the cache file while writing to avoid races.
48
+ - If the block yields an IO stream, Persist tees it so one copy is written to disk while the other is returned (and lock is held during streaming).
49
+ - Common options: :dir/:path, :no_load, :update, :lockfile, :tee_copies, :prefix, :canfail.
50
+
51
+ Memory helper
52
+ - Persist.memory(name, options={}, &block) — in-process cache (type :memory).
53
+
54
+ Registered drivers (excerpt)
55
+ - Persist.save_drivers[:tsv], [:HDB], [:BDB], [:tkh], [:fwt], [:pki]
56
+ - Persist.load_drivers for the same types.
57
+
58
+ ---
59
+
60
+ ## TSV persistence helpers
61
+
62
+ Make working with persisted TSVs straightforward.
63
+
64
+ Persist.tsv(id, options = {}, engine: nil, persist_options: {}) { |data| ... } → TSV
65
+ - Creates/opens a persistence-backed TSV under a stable persistence path for id.
66
+ - engine (or persist_options[:engine]) chooses the backend:
67
+ - 'HDB' (default), 'BDB', 'tkh' (Tkrzw HashDBM), 'fwt' (FixWidthTable), 'pki' (PackedIndex).
68
+ - The block is called with a “data” object (backend) to populate using TSV.open(..., data: data), or to write directly to the backend.
69
+ - Persists TSV metadata (key_field, fields, type, serializer) and returns a TSV-like object extended with TSVAdapter.
70
+
71
+ Persist.persist_tsv(file, filename=nil, options={}, persist_options={}, &block) → TSV
72
+ - Thin wrapper around Persist.tsv that extracts persist_* options from options for convenience.
73
+
74
+ Open a persistence database directly
75
+ - Persist.open_database(path, write, serializer=nil, type="HDB", options={}) → backend
76
+ - For 'fwt': options include :value_size, :range, :update, :in_memory, :pos_function.
77
+ - For 'pki': options include :pattern (PackedIndex mask), :pos_function.
78
+ - For 'HDB'/'BDB'/'tkh': opens a key-value store backend.
79
+
80
+ ---
81
+
82
+ ## Storage engines
83
+
84
+ ### TokyoCabinet (HDB/BDB)
85
+
86
+ - ScoutCabinet.open(path, write=true, 'HDB'|'BDB')
87
+ - Returns a TokyoCabinet database extended with ScoutCabinet helpers (read/write switching, import TSV, prefix/range when BDB).
88
+ - Used by Persist.open_tokyocabinet and the TSV adapters.
89
+
90
+ - Persist.open_tokyocabinet(path, write, serializer=nil, tokyocabinet_class='HDB') → cabinet extended with TKAdapter (inherits TSVAdapter).
91
+
92
+ - Import TSV directly (fast path):
93
+ - ScoutCabinet.importtsv(db, stream)
94
+ - Or db.load_stream(stream) via adapter wrapper.
95
+ - Expects a full TSV stream (preamble + header).
96
+
97
+ Notes:
98
+ - BDB supports range/prefix queries; HDB is unordered but supports basic CRUD.
99
+ - Annotation metadata (key_field/fields/type/etc.) is persisted as a special record and re-loaded on open.
100
+
101
+ ### Tkrzw (HashDBM)
102
+
103
+ - ScoutTKRZW.open(path, write=true, persistence_class='tkh', options={}) → Tkrzw::DBM extended with ScoutTKRZW
104
+ - options include truncate, num_buckets, sync_hard, encoding, etc.
105
+ - Works similarly to ScoutCabinet and integrates with TSVAdapter via Persist.save_drivers[:tkh]/load_drivers[:tkh].
106
+
107
+ ### FixWidthTable (FWT)
108
+
109
+ A compact file format for large position/range indices with constant record size:
110
+
111
+ - FixWidthTable.new(path, value_size, range=false, update=false, in_memory=true)
112
+ - value_size: bytes reserved per value.
113
+ - range: true to store (start,end) with overlap counters (for range indices), false for single positions.
114
+ - update: create/truncate when true (or when file missing).
115
+ - in_memory: load file into memory (StringIO) for faster reads.
116
+
117
+ - Adding data:
118
+ - add_point([value, pos]) or f.add_point(tsv) — expects TSV keyed by pos or with a field.
119
+ - add_range(value_table) — expects TSV with Start and End fields; internally calls add_range_point for each record.
120
+
121
+ - Query:
122
+ - f[pos] — returns values overlapping pos (Integer or Range). With range==false, point lookups; with range==true, range overlaps.
123
+ - f.overlaps(pos, value=false) — returns overlapping record positions (or “start:end(:value)” strings when value=true).
124
+ - f.values_at(*positions), f.chunked_values_at(keys, max=5000).
125
+
126
+ - Adapter:
127
+ - Persist.open_fwt(path, value_size, range=false, serializer=nil, update=false, in_memory=false, &pos_function) → FixWidthTable extended with FWTAdapter (TSVAdapter).
128
+ - Save/load via Persist.save_drivers[:fwt]/load_drivers[:fwt].
129
+
130
+ Example:
131
+ ```ruby
132
+ # Build a range index from a TSV of intervals [Start, End]
133
+ f = Persist.open_fwt("ranges.fwt", 100, true)
134
+ f.add_range(tsv_with_start_end)
135
+ f.read
136
+ f[3] # => %w(a b d)
137
+ f[1..6] # => ["b"]
138
+ f.overlaps(1, true) # => ["1:6:b"]
139
+ ```
140
+
141
+ ### PackedIndex (PKI)
142
+
143
+ A binary file with fixed-size rows according to a format mask:
144
+
145
+ - mask elements:
146
+ - "i"/"I" → 32/64-bit integer (internally "l"/"q").
147
+ - "f"/"F" → float/double ("f"/"d").
148
+ - "23s" → fixed 23-byte string ("a23").
149
+ - "<code>:<n>" → raw bytes specification.
150
+
151
+ - PackedIndex.new(path, write=false, pattern=nil)
152
+ - When write=true, pattern must be provided; when reading, pattern is loaded from file header.
153
+ - << payload — write a row (payload.pack(mask)), or write nil as a NIL sentinel.
154
+ - [position] — read a row at position (returns nil for NIL rows).
155
+ - values_at(*positions), size.
156
+
157
+ - Adapter:
158
+ - Persist.open_pki(path, write, pattern, &pos_function) → PackedIndex extended with PKIAdapter (TSVAdapter).
159
+ - PKIAdapter adds convenience: add(key,value) with Numeric key (skips with NILs), [] supporting pos_function (e.g., parse "chr:pos").
160
+
161
+ Example:
162
+ ```ruby
163
+ pi = Persist.open_pki("packed.idx", true, %w(i i 23s f f))
164
+ 100.times { |i| pi << [i, i+2, i.to_s * 10, rand, rand] }
165
+ pi << nil # NIL row
166
+ pi.close
167
+
168
+ pi = Persist.open_pki("packed.idx", false, %w(i i 23s f f))
169
+ pi[10] # => [10, 12, "1010101010", 0.123..., 0.456...]
170
+ ```
171
+
172
+ ### Sharder
173
+
174
+ Horizontally shards key-value storage across multiple underlying databases based on a shard function:
175
+
176
+ - Sharder.new(persistence_dir, write=false, db_type='HDB', persist_options={}, &shard_function)
177
+ - shard_function (Proc) maps a key to a shard name (string/number).
178
+ - Each shard lives under <dir>/shard-<name> as a separate backend (HDB/BDB/tkh/fwt/pki depending on db_type/options).
179
+
180
+ - Access:
181
+ - sharder[key] / sharder[key] = value — routes to the shard’s database.
182
+ - database(key) — returns shard backend for a key.
183
+ - size, keys, each, include?, prefix(key) (when supported by backend, e.g. BDB range).
184
+
185
+ - Persist.open_sharder(persistence_dir, write=false, db_type=nil, persist_options={}, &shard_function) → Sharder extended with ShardAdapter (TSVAdapter).
186
+ - ShardAdapter saves TSV metadata in <dir>/metadata, exposes TSV-like API, merges keys/size across shards.
187
+
188
+ - Combining with TSV persistence:
189
+ - Persist.tsv(..., persist_options: { shard_function: ->(k){ ... } }) { |data| TSV.open(file, data: data, ...) }
190
+
191
+ Examples:
192
+ ```ruby
193
+ # Split by last character of key
194
+ sh = Persist.open_sharder("shards", true, :HDB, shard_function: ->(k){ k[-1] })
195
+ sh["key-a"] = "a"
196
+ sh["key-b"] = "b"
197
+ sh["key-a"] # => "a"
198
+
199
+ # Shard TSV by last char of ID
200
+ sh_tsv = Persist.tsv("my-sharded-tsv", persist_options: { shard_function: ->(k){ k[-1] } }) do |data|
201
+ TSV.open(tsv_path, data: data, type: :list)
202
+ end
203
+ sh_tsv["id1"]["ValueA"] # => "a1"
204
+ ```
205
+
206
+ ---
207
+
208
+ ## TSVAdapter
209
+
210
+ TSVAdapter turns key-value stores and engine wrappers into TSV-like objects with:
211
+
212
+ - Annotations: key_field, fields, type, filename, namespace, identifiers, serializer, unnamed.
213
+ - Concurrency:
214
+ - read/write/close with flags (read?, write?, closed?).
215
+ - read_lock/write_lock, with_read/with_write for safe critical sections.
216
+ - File-based lock in TSVAdapter.lock_dir (tmp/tsv_locks) around write transitions.
217
+
218
+ - Serialization:
219
+ - Per-TSV serializer for values. Defaults depend on type:
220
+ - :single → StringSerializer
221
+ - :list/:flat → StringArraySerializer
222
+ - :double → StringDoubleArraySerializer
223
+ - :integer/_array, :float/_array, :marshal, :json, :binary, :tsv, etc.
224
+ - Accessors wrap/unpack values transparently:
225
+ - tsv["key"] returns decoded (and NamedArray-wrapped) values.
226
+ - tsv.orig_get("key") returns encoded raw storage.
227
+
228
+ - Metadata persistence:
229
+ - Cabinet/PKI/FWT/Sharder adapters persist the annotation hash either in a special record (key "__annotation_hash__") or a sidecar metadata file (.metadata).
230
+ - On open, adapters load metadata and re-annotate the backend as a TSV.
231
+
232
+ - Helpers:
233
+ - keys, each, size filter out the special annotation record.
234
+ - prefix(key) and range (when backend supports them; BDB provides range).
235
+ - merge!(hash), values_at(*keys), collect/map, include?
236
+
237
+ Serializer modules (subset):
238
+ - :integer IntegerSerializer, :float FloatSerializer
239
+ - :integer_array IntegerArraySerializer (with NIL sentinel), :float_array FloatArraySerializer
240
+ - :strict_integer_array/:strict_float_array (pack/unpack only)
241
+ - :string StringSerializer, :binary BinarySerializer
242
+ - :marshal Marshal, :json JSON
243
+ - :tsv TSVSerializer (dump TSV.to_s; load TSV.open)
244
+ - :marshal_tsv TSVMarshalSerializer (Marshal.dump/load TSV)
245
+
246
+ Example:
247
+ ```ruby
248
+ tsv = Persist.open_tokyocabinet("db.hdb", true)
249
+ TSV.setup(tsv, key_field: "Key", fields: %w(One Two), type: :list)
250
+ tsv.extend TSVAdapter
251
+ tsv.serializer = :marshal
252
+ tsv["a"] = [1, 2]
253
+ tsv["a"] # => [1, 2]
254
+ Marshal.load(tsv.orig_get("a")) # => [1, 2]
255
+ ```
256
+
257
+ ---
258
+
259
+ ## Examples
260
+
261
+ Basic persist and reload:
262
+
263
+ ```ruby
264
+ # Cache a TSV under a logical id
265
+ content = <<~EOF
266
+ #: :sep=/\\s+/#:type=:double#:merge=:concat
267
+ #Id ValueA ValueB OtherID
268
+ row1 a|aa|aaa b Id1|Id2
269
+ row2 A B Id3
270
+ row2 a a id3
271
+ EOF
272
+
273
+ tsv = Persist.persist("TEST Persist TSV", :tsv) do
274
+ TmpFile.with_file(content) { |file| TSV.open(file) }
275
+ end
276
+
277
+ # Subsequent calls load cached TSV even if block raises
278
+ tsv2 = Persist.persist("TEST Persist TSV", :tsv) { raise "won't run" }
279
+ ```
280
+
281
+ Persist.tsv (populate into a specific backend):
282
+
283
+ ```ruby
284
+ tsv = Persist.tsv("Some TSV") do |data|
285
+ TSV.open("input.tsv", persist_data: data) # data is the persistence backend
286
+ end
287
+ tsv["row1"]["ValueB"] # => ["b"]
288
+ ```
289
+
290
+ Open cabinets and import TSV quickly:
291
+
292
+ ```ruby
293
+ parser = TSV::Parser.new("big.tsv", type: :double)
294
+ db = ScoutCabinet.open("big.hdb", true, :HDB)
295
+ parser.with_stream { |stream| ScoutCabinet.importtsv(db, stream) }
296
+ db.write_and_read do
297
+ TSV.setup(db, **parser.options)
298
+ db.extend TSVAdapter
299
+ end
300
+ db["row2"]["ValueA"] # => ["A","AA"]
301
+ ```
302
+
303
+ FixWidthTable for ranges:
304
+
305
+ ```ruby
306
+ f = Persist.open_fwt("ranges.fwt", 100, true)
307
+ f.add_range(range_tsv) # TSV with Start/End
308
+ f.read
309
+ f[3] # => keys overlapping position 3
310
+ f[3..4] # => keys overlapping range 3..4
311
+ f.overlaps(1) # => ["1:6"]
312
+ ```
313
+
314
+ PackedIndex:
315
+
316
+ ```ruby
317
+ pi = Persist.open_pki("packed.idx", true, %w(i i 23s f f))
318
+ 100.times { |i| pi << [i, i+2, i.to_s*10, rand, rand] }
319
+ pi << nil # sparse rows
320
+ pi.close
321
+
322
+ pi = Persist.open_pki("packed.idx", false, %w(i i 23s f f))
323
+ pi[10] # => [10, 12, "1010101010", 0.x, 0.y]
324
+ ```
325
+
326
+ Sharded TSV (HDB):
327
+
328
+ ```ruby
329
+ sh = Persist.tsv("sharded", persist_options: { shard_function: ->(k){ k[-1] } }) do |data|
330
+ TSV.open("table.tsv", data: data, type: :list)
331
+ end
332
+ sh["id1"]["ValueA"] # => "a1"
333
+ sh.prefix("id1") # requires BDB engine to support range/prefix
334
+ ```
335
+
336
+ Tkrzw:
337
+
338
+ ```ruby
339
+ db = ScoutTKRZW.open("tk.tkh", true)
340
+ 1000.times { |i| db["foo#{i}"] = "bar#{i}" }
341
+ db.close
342
+ db2 = ScoutTKRZW.open("tk.tkh", false)
343
+ db2.keys.length # => 1000
344
+ ```
345
+
346
+ Float arrays with typed serializer:
347
+
348
+ ```ruby
349
+ tsv = TSV.open("values.tsv", persist: true, type: :list, cast: :to_f, persist_update: true)
350
+ tsv.serializer # => TSVAdapter::FloatArraySerializer
351
+ tsv["row1"] # => [0.2, 0.3, 0.0]
352
+ ```
353
+
354
+ ---
355
+
356
+ Persist centers the framework’s caching and storage patterns: a single entry point to save/load arbitrary objects, plus first-class TSV persistence on top of multiple storage engines, with safe locking and atomicity. Use Persist.persist for general caches; use Persist.tsv or Persist.persist_tsv when dealing with TSV-shaped data and pick the engine that best matches your workload (HDB/BDB/tkh for generic KV stores, FWT for range/position indices, PKI for compact fixed-layout records, Sharder to scale by shards).
data/doc/Semaphore.md ADDED
@@ -0,0 +1,171 @@
1
+ # Semaphore (ScoutSemaphore)
2
+
3
+ ScoutSemaphore provides simple process/thread concurrency control primitives built on:
4
+
5
+ - POSIX named semaphores (via RubyInline C bindings to sem_open/sem_wait/sem_post/sem_unlink).
6
+ - Convenience helpers to scope a semaphore’s lifetime and run critical sections.
7
+ - Utilities to process collections concurrently with a bounded level of parallelism, using either processes (via TSV.traverse) or threads.
8
+
9
+ Requirements:
10
+ - RubyInline (gem ‘inline’) to compile the small C bindings at runtime.
11
+ - A platform with POSIX named semaphores (sem_open). If RubyInline is unavailable, Scout logs a warning and the module will not be functional.
12
+
13
+ Sections:
14
+ - Design and prerequisites
15
+ - API
16
+ - with_semaphore
17
+ - synchronize
18
+ - fork_each_on_semaphore (process-based)
19
+ - thread_each_on_semaphore (thread-based)
20
+ - Low-level C bindings (create/delete/wait/post)
21
+ - Usage examples
22
+ - Notes and caveats
23
+
24
+ ---
25
+
26
+ ## Design and prerequisites
27
+
28
+ At its core, ScoutSemaphore exposes four C-bound singleton methods using RubyInline:
29
+
30
+ - create_semaphore(name, value) — sem_open(name, O_CREAT, …, value)
31
+ - delete_semaphore(name) — sem_unlink(name)
32
+ - wait_semaphore(name) — sem_wait on the named semaphore
33
+ - post_semaphore(name) — sem_post on the named semaphore
34
+
35
+ On top of these, high-level Ruby methods make it easy to:
36
+ - Create a semaphore for the duration of a block and guarantee cleanup (with_semaphore).
37
+ - Run a critical section by waiting, yielding, and posting (synchronize).
38
+ - Traverse a list with bounded concurrency, either via background processes (fork_each_on_semaphore) or threads (thread_each_on_semaphore).
39
+
40
+ If RubyInline cannot be loaded, a warning is emitted and none of these methods are defined (tests should skip accordingly).
41
+
42
+ ---
43
+
44
+ ## API
45
+
46
+ ### with_semaphore(size, file = nil) { |sem_name| ... } → nil
47
+
48
+ Create a named semaphore with an initial count “size”, yield its name to the block, and destroy it afterwards.
49
+
50
+ - size: Integer initial tokens (maximum concurrent “holders”).
51
+ - file (String, optional): name/identifier for the semaphore. If nil, a unique name is generated, prefixed internally (e.g., “/scout-<digest>”). When a custom name is provided, slashes are sanitized.
52
+
53
+ Behavior:
54
+ - Logs creation and removal.
55
+ - Ensures sem_unlink on exit, even if the block raises.
56
+
57
+ Use this to scope semaphore lifetime and pass its name to other helpers or processes.
58
+
59
+ ### synchronize(sem_name) { ... } → result
60
+
61
+ Wait on a named semaphore, run the critical section, then post it back.
62
+
63
+ - sem_name: the name returned by with_semaphore (or any existing semaphore name).
64
+ - Ensures that sem_post is called even if the block raises.
65
+
66
+ Exceptions:
67
+ - If sem_wait fails, a ScoutSemaphore::SemaphoreInterrupted (subclass of TryAgain) may be raised (see note below).
68
+
69
+ ### fork_each_on_semaphore(elems, size, file = nil) { |elem| ... } → nil
70
+
71
+ Process a collection with at most “size” concurrent workers using TSV.traverse (process-based parallelism).
72
+
73
+ - elems: any enumerable (Array, TSV, IO, etc.; TSV.traverse-compatible).
74
+ - size: max concurrent workers (cpus).
75
+ - file: ignored for traversal (kept for API symmetry).
76
+
77
+ Behavior:
78
+ - Uses TSV.traverse with :cpus => size and a progress bar.
79
+ - Yields elem to the block in worker subprocesses.
80
+ - Logs and swallows Interrupts in workers.
81
+
82
+ Use this when you want process-level parallelism, isolation, and streaming integration via TSV.traverse.
83
+
84
+ ### thread_each_on_semaphore(elems, size) { |elem| ... } → nil
85
+
86
+ Process a collection with at most “size” concurrent threads using a Ruby Mutex/ConditionVariable.
87
+
88
+ - elems: any enumerable.
89
+ - size: max concurrent threads.
90
+
91
+ Behavior:
92
+ - Spawns a thread per element but gates entry to the critical region so that only “size” threads run the block simultaneously.
93
+ - On any exception, logs and ensures remaining threads are terminated (kill).
94
+
95
+ Use this when threads are sufficient and you prefer not to fork.
96
+
97
+ ---
98
+
99
+ ## Low-level C bindings (via RubyInline)
100
+
101
+ These are provided as module singleton methods and used internally:
102
+
103
+ - ScoutSemaphore.create_semaphore(name:String, value:Integer) → void
104
+ - ScoutSemaphore.delete_semaphore(name:String) → void
105
+ - ScoutSemaphore.wait_semaphore(name:String) → Integer (0 on success; errno on error)
106
+ - ScoutSemaphore.post_semaphore(name:String) → void
107
+
108
+ Note:
109
+ - Named semaphores differ by platform. The auto-generated default names start with “/scout-…”. Custom names passed to with_semaphore are sanitized (slashes replaced), so prefer the default unless you have a specific need.
110
+
111
+ ---
112
+
113
+ ## Usage examples
114
+
115
+ Basic scoping and critical section:
116
+
117
+ ```ruby
118
+ ScoutSemaphore.with_semaphore(1) do |sem|
119
+ 10.times do
120
+ ScoutSemaphore.synchronize(sem) do
121
+ # Only one process/thread will execute this at any time
122
+ do_critical_work()
123
+ end
124
+ end
125
+ end
126
+ ```
127
+
128
+ Process-based parallel map (bounded):
129
+
130
+ ```ruby
131
+ items = (1..1000).to_a
132
+ ScoutSemaphore.fork_each_on_semaphore(items, 4) do |i|
133
+ compute(i) # up to 4 worker processes run concurrently
134
+ end
135
+ ```
136
+
137
+ Thread-based parallel map (bounded):
138
+
139
+ ```ruby
140
+ items = (1..1000).to_a
141
+ ScoutSemaphore.thread_each_on_semaphore(items, 8) do |i|
142
+ compute_in_thread(i) # up to 8 threads run concurrently
143
+ end
144
+ ```
145
+
146
+ Coordination across processes:
147
+
148
+ ```ruby
149
+ # In coordinator
150
+ ScoutSemaphore.with_semaphore(2) do |sem|
151
+ # Start several worker processes; pass sem to each (e.g., via ENV, argv, IPC)
152
+ end
153
+
154
+ # In each worker process
155
+ sem = ENV["SCOUT_SEM"]
156
+ ScoutSemaphore.synchronize(sem) do
157
+ # guarded work
158
+ end
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Notes and caveats
164
+
165
+ - RubyInline dependency: If ‘inline’ cannot be loaded, ScoutSemaphore logs a warning and does not provide these methods. Install the gem or guard your code.
166
+ - Semaphore naming: POSIX named semaphores are typically referenced by a leading “/” path-like name. The default generated names follow this. If you pass a custom name to with_semaphore, it is sanitized (slashes replaced) before creation; prefer defaults unless you’re coordinating with external code that expects a specific name.
167
+ - Error reporting: wait_semaphore returns errno on failure. synchronize currently checks for a specific return to raise SemaphoreInterrupted; in practice you will see exceptions only if the system-level wait fails.
168
+ - fork_each_on_semaphore: this helper doesn’t use OS-level semaphores; it leverages TSV.traverse with :cpus => size (process pool). Choose this when you need the TSV/Open streaming ecosystem and process isolation.
169
+ - thread_each_on_semaphore: the concurrency limit is enforced with a simple counter and Mutex/ConditionVariable; it is not an OS semaphore. It ensures threads are joined/killed on error, but still prefer robust error handling in your block.
170
+
171
+ ScoutSemaphore gives you simple, robust building blocks to bound concurrency and protect critical sections in both process- and thread-based strategies, while integrating nicely with the rest of Scout’s TSV/Open streaming infrastructure.