scout-gear 10.8.4 → 10.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.vimproject +13 -0
- data/README.md +352 -0
- data/VERSION +1 -1
- data/doc/Association.md +288 -0
- data/doc/Entity.md +296 -0
- data/doc/KnowledgeBase.md +433 -0
- data/doc/Persist.md +356 -0
- data/doc/Semaphore.md +171 -0
- data/doc/TSV.md +449 -0
- data/doc/WorkQueue.md +359 -0
- data/doc/Workflow.md +586 -0
- data/lib/scout/association.rb +4 -2
- data/lib/scout/entity/identifiers.rb +1 -1
- data/lib/scout/entity/object.rb +1 -1
- data/lib/scout/entity/property.rb +5 -5
- data/lib/scout/entity.rb +1 -1
- data/lib/scout/knowledge_base/description.rb +1 -1
- data/lib/scout/knowledge_base/list.rb +7 -2
- data/lib/scout/knowledge_base/registry.rb +2 -2
- data/lib/scout/knowledge_base.rb +20 -2
- data/lib/scout/monitor.rb +10 -6
- data/lib/scout/persist/engine/packed_index.rb +2 -2
- data/lib/scout/persist/engine/sharder.rb +1 -1
- data/lib/scout/persist/tsv.rb +1 -0
- data/lib/scout/semaphore.rb +1 -1
- data/lib/scout/tsv/dumper.rb +3 -3
- data/lib/scout/tsv/open.rb +1 -0
- data/lib/scout/tsv/parser.rb +1 -1
- data/lib/scout/tsv/transformer.rb +1 -0
- data/lib/scout/tsv/util.rb +2 -2
- data/lib/scout/work_queue/socket.rb +1 -1
- data/lib/scout/work_queue/worker.rb +7 -5
- data/lib/scout/workflow/entity.rb +22 -1
- data/lib/scout/workflow/step/config.rb +3 -3
- data/lib/scout/workflow/step/file.rb +4 -0
- data/lib/scout/workflow/step/info.rb +8 -2
- data/lib/scout/workflow/step.rb +10 -5
- data/lib/scout/workflow/task/inputs.rb +1 -1
- data/lib/scout/workflow/usage.rb +3 -2
- data/lib/scout/workflow/util.rb +22 -0
- data/scout-gear.gemspec +16 -5
- data/scout_commands/cat +86 -0
- data/scout_commands/doc +3 -1
- data/scout_commands/entity +151 -0
- data/scout_commands/system/status +238 -0
- data/scout_commands/workflow/info +23 -10
- data/scout_commands/workflow/install +1 -1
- data/test/scout/entity/test_property.rb +1 -1
- data/test/scout/knowledge_base/test_registry.rb +19 -0
- data/test/scout/test_work_queue.rb +1 -1
- data/test/scout/work_queue/test_worker.rb +12 -10
- metadata +15 -4
- data/doc/lib/scout/path.md +0 -35
- data/doc/lib/scout/workflow/task.md +0 -13
data/doc/Persist.md
ADDED
|
@@ -0,0 +1,356 @@
|
|
|
1
|
+
# Persist
|
|
2
|
+
|
|
3
|
+
Persist provides a unified, engine-agnostic persistence layer for serializing, saving and loading Ruby objects and TSVs to/from files, with:
|
|
4
|
+
|
|
5
|
+
- Typed serialization for common types and TSV shapes.
|
|
6
|
+
- High-level caching with locking and atomic writes (Persist.persist).
|
|
7
|
+
- TSV-aware persistence (Persist.tsv / Persist.persist_tsv) with pluggable storage engines.
|
|
8
|
+
- Storage engines:
|
|
9
|
+
- TokyoCabinet HDB/BDB (key-value stores) with TSVAdapter.
|
|
10
|
+
- Tkrzw (HashDBM) via ScoutTKRZW.
|
|
11
|
+
- FixWidthTable (position/range index with fixed-size records).
|
|
12
|
+
- PackedIndex (binary-packed fixed-size rows).
|
|
13
|
+
- Sharder (key-space sharding over sub-databases).
|
|
14
|
+
- Concurrency-safe read/write locks and metadata persistence.
|
|
15
|
+
|
|
16
|
+
Sections:
|
|
17
|
+
- Core API (serialize, save/load, persist)
|
|
18
|
+
- TSV persistence helpers
|
|
19
|
+
- Storage engines
|
|
20
|
+
- TokyoCabinet (HDB/BDB)
|
|
21
|
+
- Tkrzw (HashDBM)
|
|
22
|
+
- FixWidthTable (FWT) — point/range indices
|
|
23
|
+
- PackedIndex (PKI) — fixed-layout random access
|
|
24
|
+
- Sharder — key-based horizontal sharding
|
|
25
|
+
- TSVAdapter (persistence adapters and serializers)
|
|
26
|
+
- Examples
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Core API
|
|
31
|
+
|
|
32
|
+
Serialization helpers
|
|
33
|
+
- Persist.serialize(content, type) / Persist.deserialize(serialized, type)
|
|
34
|
+
- Types include string/text/file/path/binary, integer/float/boolean, array, json/yaml/marshal, and “_array” variants.
|
|
35
|
+
- IO/StringIO are read into strings when needed; :binary writes in 'wb'.
|
|
36
|
+
|
|
37
|
+
Save/load
|
|
38
|
+
- Persist.save(content, file, type = :serializer)
|
|
39
|
+
- Uses registered save_drivers (see below); otherwise serializes and writes atomically (Open.sensible_write).
|
|
40
|
+
- Supports special type :memory (in-RAM).
|
|
41
|
+
- Persist.load(file, type = :serializer)
|
|
42
|
+
- Uses registered load_drivers; returns typed Ruby object (or persistence-backed TSV for TSV types).
|
|
43
|
+
|
|
44
|
+
High-level caching
|
|
45
|
+
- Persist.persist(name, type = :serializer, options = {}) { |maybe_file| value }
|
|
46
|
+
- Computes and stores a value if missing/outdated, then returns it.
|
|
47
|
+
- Locks the cache file while writing to avoid races.
|
|
48
|
+
- If the block yields an IO stream, Persist tees it so one copy is written to disk while the other is returned (and lock is held during streaming).
|
|
49
|
+
- Common options: :dir/:path, :no_load, :update, :lockfile, :tee_copies, :prefix, :canfail.
|
|
50
|
+
|
|
51
|
+
Memory helper
|
|
52
|
+
- Persist.memory(name, options={}, &block) — in-process cache (type :memory).
|
|
53
|
+
|
|
54
|
+
Registered drivers (excerpt)
|
|
55
|
+
- Persist.save_drivers[:tsv], [:HDB], [:BDB], [:tkh], [:fwt], [:pki]
|
|
56
|
+
- Persist.load_drivers for the same types.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## TSV persistence helpers
|
|
61
|
+
|
|
62
|
+
Make working with persisted TSVs straightforward.
|
|
63
|
+
|
|
64
|
+
Persist.tsv(id, options = {}, engine: nil, persist_options: {}) { |data| ... } → TSV
|
|
65
|
+
- Creates/opens a persistence-backed TSV under a stable persistence path for id.
|
|
66
|
+
- engine (or persist_options[:engine]) chooses the backend:
|
|
67
|
+
- 'HDB' (default), 'BDB', 'tkh' (Tkrzw HashDBM), 'fwt' (FixWidthTable), 'pki' (PackedIndex).
|
|
68
|
+
- The block is called with a “data” object (backend) to populate using TSV.open(..., data: data), or to write directly to the backend.
|
|
69
|
+
- Persists TSV metadata (key_field, fields, type, serializer) and returns a TSV-like object extended with TSVAdapter.
|
|
70
|
+
|
|
71
|
+
Persist.persist_tsv(file, filename=nil, options={}, persist_options={}, &block) → TSV
|
|
72
|
+
- Thin wrapper around Persist.tsv that extracts persist_* options from options for convenience.
|
|
73
|
+
|
|
74
|
+
Open a persistence database directly
|
|
75
|
+
- Persist.open_database(path, write, serializer=nil, type="HDB", options={}) → backend
|
|
76
|
+
- For 'fwt': options include :value_size, :range, :update, :in_memory, :pos_function.
|
|
77
|
+
- For 'pki': options include :pattern (PackedIndex mask), :pos_function.
|
|
78
|
+
- For 'HDB'/'BDB'/'tkh': opens a key-value store backend.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Storage engines
|
|
83
|
+
|
|
84
|
+
### TokyoCabinet (HDB/BDB)
|
|
85
|
+
|
|
86
|
+
- ScoutCabinet.open(path, write=true, 'HDB'|'BDB')
|
|
87
|
+
- Returns a TokyoCabinet database extended with ScoutCabinet helpers (read/write switching, import TSV, prefix/range when BDB).
|
|
88
|
+
- Used by Persist.open_tokyocabinet and the TSV adapters.
|
|
89
|
+
|
|
90
|
+
- Persist.open_tokyocabinet(path, write, serializer=nil, tokyocabinet_class='HDB') → cabinet extended with TKAdapter (inherits TSVAdapter).
|
|
91
|
+
|
|
92
|
+
- Import TSV directly (fast path):
|
|
93
|
+
- ScoutCabinet.importtsv(db, stream)
|
|
94
|
+
- Or db.load_stream(stream) via adapter wrapper.
|
|
95
|
+
- Expects a full TSV stream (preamble + header).
|
|
96
|
+
|
|
97
|
+
Notes:
|
|
98
|
+
- BDB supports range/prefix queries; HDB is unordered but supports basic CRUD.
|
|
99
|
+
- Annotation metadata (key_field/fields/type/etc.) is persisted as a special record and re-loaded on open.
|
|
100
|
+
|
|
101
|
+
### Tkrzw (HashDBM)
|
|
102
|
+
|
|
103
|
+
- ScoutTKRZW.open(path, write=true, persistence_class='tkh', options={}) → Tkrzw::DBM extended with ScoutTKRZW
|
|
104
|
+
- options include truncate, num_buckets, sync_hard, encoding, etc.
|
|
105
|
+
- Works similarly to ScoutCabinet and integrates with TSVAdapter via Persist.save_drivers[:tkh]/load_drivers[:tkh].
|
|
106
|
+
|
|
107
|
+
### FixWidthTable (FWT)
|
|
108
|
+
|
|
109
|
+
A compact file format for large position/range indices with constant record size:
|
|
110
|
+
|
|
111
|
+
- FixWidthTable.new(path, value_size, range=false, update=false, in_memory=true)
|
|
112
|
+
- value_size: bytes reserved per value.
|
|
113
|
+
- range: true to store (start,end) with overlap counters (for range indices), false for single positions.
|
|
114
|
+
- update: create/truncate when true (or when file missing).
|
|
115
|
+
- in_memory: load file into memory (StringIO) for faster reads.
|
|
116
|
+
|
|
117
|
+
- Adding data:
|
|
118
|
+
- add_point([value, pos]) or f.add_point(tsv) — expects TSV keyed by pos or with a field.
|
|
119
|
+
- add_range(value_table) — expects TSV with Start and End fields; internally calls add_range_point for each record.
|
|
120
|
+
|
|
121
|
+
- Query:
|
|
122
|
+
- f[pos] — returns values overlapping pos (Integer or Range). With range==false, point lookups; with range==true, range overlaps.
|
|
123
|
+
- f.overlaps(pos, value=false) — returns overlapping record positions (or “start:end(:value)” strings when value=true).
|
|
124
|
+
- f.values_at(*positions), f.chunked_values_at(keys, max=5000).
|
|
125
|
+
|
|
126
|
+
- Adapter:
|
|
127
|
+
- Persist.open_fwt(path, value_size, range=false, serializer=nil, update=false, in_memory=false, &pos_function) → FixWidthTable extended with FWTAdapter (TSVAdapter).
|
|
128
|
+
- Save/load via Persist.save_drivers[:fwt]/load_drivers[:fwt].
|
|
129
|
+
|
|
130
|
+
Example:
|
|
131
|
+
```ruby
|
|
132
|
+
# Build a range index from a TSV of intervals [Start, End]
|
|
133
|
+
f = Persist.open_fwt("ranges.fwt", 100, true)
|
|
134
|
+
f.add_range(tsv_with_start_end)
|
|
135
|
+
f.read
|
|
136
|
+
f[3] # => %w(a b d)
|
|
137
|
+
f[1..6] # => ["b"]
|
|
138
|
+
f.overlaps(1, true) # => ["1:6:b"]
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### PackedIndex (PKI)
|
|
142
|
+
|
|
143
|
+
A binary file with fixed-size rows according to a format mask:
|
|
144
|
+
|
|
145
|
+
- mask elements:
|
|
146
|
+
- "i"/"I" → 32/64-bit integer (internally "l"/"q").
|
|
147
|
+
- "f"/"F" → float/double ("f"/"d").
|
|
148
|
+
- "23s" → fixed 23-byte string ("a23").
|
|
149
|
+
- "<code>:<n>" → raw bytes specification.
|
|
150
|
+
|
|
151
|
+
- PackedIndex.new(path, write=false, pattern=nil)
|
|
152
|
+
- When write=true, pattern must be provided; when reading, pattern is loaded from file header.
|
|
153
|
+
- << payload — write a row (payload.pack(mask)), or write nil as a NIL sentinel.
|
|
154
|
+
- [position] — read a row at position (returns nil for NIL rows).
|
|
155
|
+
- values_at(*positions), size.
|
|
156
|
+
|
|
157
|
+
- Adapter:
|
|
158
|
+
- Persist.open_pki(path, write, pattern, &pos_function) → PackedIndex extended with PKIAdapter (TSVAdapter).
|
|
159
|
+
- PKIAdapter adds convenience: add(key,value) with Numeric key (skips with NILs), [] supporting pos_function (e.g., parse "chr:pos").
|
|
160
|
+
|
|
161
|
+
Example:
|
|
162
|
+
```ruby
|
|
163
|
+
pi = Persist.open_pki("packed.idx", true, %w(i i 23s f f))
|
|
164
|
+
100.times { |i| pi << [i, i+2, i.to_s * 10, rand, rand] }
|
|
165
|
+
pi << nil # NIL row
|
|
166
|
+
pi.close
|
|
167
|
+
|
|
168
|
+
pi = Persist.open_pki("packed.idx", false, %w(i i 23s f f))
|
|
169
|
+
pi[10] # => [10, 12, "1010101010", 0.123..., 0.456...]
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Sharder
|
|
173
|
+
|
|
174
|
+
Horizontally shards key-value storage across multiple underlying databases based on a shard function:
|
|
175
|
+
|
|
176
|
+
- Sharder.new(persistence_dir, write=false, db_type='HDB', persist_options={}, &shard_function)
|
|
177
|
+
- shard_function (Proc) maps a key to a shard name (string/number).
|
|
178
|
+
- Each shard lives under <dir>/shard-<name> as a separate backend (HDB/BDB/tkh/fwt/pki depending on db_type/options).
|
|
179
|
+
|
|
180
|
+
- Access:
|
|
181
|
+
- sharder[key] / sharder[key] = value — routes to the shard’s database.
|
|
182
|
+
- database(key) — returns shard backend for a key.
|
|
183
|
+
- size, keys, each, include?, prefix(key) (when supported by backend, e.g. BDB range).
|
|
184
|
+
|
|
185
|
+
- Persist.open_sharder(persistence_dir, write=false, db_type=nil, persist_options={}, &shard_function) → Sharder extended with ShardAdapter (TSVAdapter).
|
|
186
|
+
- ShardAdapter saves TSV metadata in <dir>/metadata, exposes TSV-like API, merges keys/size across shards.
|
|
187
|
+
|
|
188
|
+
- Combining with TSV persistence:
|
|
189
|
+
- Persist.tsv(..., persist_options: { shard_function: ->(k){ ... } }) { |data| TSV.open(file, data: data, ...) }
|
|
190
|
+
|
|
191
|
+
Examples:
|
|
192
|
+
```ruby
|
|
193
|
+
# Split by last character of key
|
|
194
|
+
sh = Persist.open_sharder("shards", true, :HDB, shard_function: ->(k){ k[-1] })
|
|
195
|
+
sh["key-a"] = "a"
|
|
196
|
+
sh["key-b"] = "b"
|
|
197
|
+
sh["key-a"] # => "a"
|
|
198
|
+
|
|
199
|
+
# Shard TSV by last char of ID
|
|
200
|
+
sh_tsv = Persist.tsv("my-sharded-tsv", persist_options: { shard_function: ->(k){ k[-1] } }) do |data|
|
|
201
|
+
TSV.open(tsv_path, data: data, type: :list)
|
|
202
|
+
end
|
|
203
|
+
sh_tsv["id1"]["ValueA"] # => "a1"
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## TSVAdapter
|
|
209
|
+
|
|
210
|
+
TSVAdapter turns key-value stores and engine wrappers into TSV-like objects with:
|
|
211
|
+
|
|
212
|
+
- Annotations: key_field, fields, type, filename, namespace, identifiers, serializer, unnamed.
|
|
213
|
+
- Concurrency:
|
|
214
|
+
- read/write/close with flags (read?, write?, closed?).
|
|
215
|
+
- read_lock/write_lock, with_read/with_write for safe critical sections.
|
|
216
|
+
- File-based lock in TSVAdapter.lock_dir (tmp/tsv_locks) around write transitions.
|
|
217
|
+
|
|
218
|
+
- Serialization:
|
|
219
|
+
- Per-TSV serializer for values. Defaults depend on type:
|
|
220
|
+
- :single → StringSerializer
|
|
221
|
+
- :list/:flat → StringArraySerializer
|
|
222
|
+
- :double → StringDoubleArraySerializer
|
|
223
|
+
- :integer/_array, :float/_array, :marshal, :json, :binary, :tsv, etc.
|
|
224
|
+
- Accessors wrap/unpack values transparently:
|
|
225
|
+
- tsv["key"] returns decoded (and NamedArray-wrapped) values.
|
|
226
|
+
- tsv.orig_get("key") returns encoded raw storage.
|
|
227
|
+
|
|
228
|
+
- Metadata persistence:
|
|
229
|
+
- Cabinet/PKI/FWT/Sharder adapters persist the annotation hash either in a special record (key "__annotation_hash__") or a sidecar metadata file (.metadata).
|
|
230
|
+
- On open, adapters load metadata and re-annotate the backend as a TSV.
|
|
231
|
+
|
|
232
|
+
- Helpers:
|
|
233
|
+
- keys, each, size filter out the special annotation record.
|
|
234
|
+
- prefix(key) and range (when backend supports them; BDB provides range).
|
|
235
|
+
- merge!(hash), values_at(*keys), collect/map, include?
|
|
236
|
+
|
|
237
|
+
Serializer modules (subset):
|
|
238
|
+
- :integer IntegerSerializer, :float FloatSerializer
|
|
239
|
+
- :integer_array IntegerArraySerializer (with NIL sentinel), :float_array FloatArraySerializer
|
|
240
|
+
- :strict_integer_array/:strict_float_array (pack/unpack only)
|
|
241
|
+
- :string StringSerializer, :binary BinarySerializer
|
|
242
|
+
- :marshal Marshal, :json JSON
|
|
243
|
+
- :tsv TSVSerializer (dump TSV.to_s; load TSV.open)
|
|
244
|
+
- :marshal_tsv TSVMarshalSerializer (Marshal.dump/load TSV)
|
|
245
|
+
|
|
246
|
+
Example:
|
|
247
|
+
```ruby
|
|
248
|
+
tsv = Persist.open_tokyocabinet("db.hdb", true)
|
|
249
|
+
TSV.setup(tsv, key_field: "Key", fields: %w(One Two), type: :list)
|
|
250
|
+
tsv.extend TSVAdapter
|
|
251
|
+
tsv.serializer = :marshal
|
|
252
|
+
tsv["a"] = [1, 2]
|
|
253
|
+
tsv["a"] # => [1, 2]
|
|
254
|
+
Marshal.load(tsv.orig_get("a")) # => [1, 2]
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Examples
|
|
260
|
+
|
|
261
|
+
Basic persist and reload:
|
|
262
|
+
|
|
263
|
+
```ruby
|
|
264
|
+
# Cache a TSV under a logical id
|
|
265
|
+
content = <<~EOF
|
|
266
|
+
#: :sep=/\\s+/#:type=:double#:merge=:concat
|
|
267
|
+
#Id ValueA ValueB OtherID
|
|
268
|
+
row1 a|aa|aaa b Id1|Id2
|
|
269
|
+
row2 A B Id3
|
|
270
|
+
row2 a a id3
|
|
271
|
+
EOF
|
|
272
|
+
|
|
273
|
+
tsv = Persist.persist("TEST Persist TSV", :tsv) do
|
|
274
|
+
TmpFile.with_file(content) { |file| TSV.open(file) }
|
|
275
|
+
end
|
|
276
|
+
|
|
277
|
+
# Subsequent calls load cached TSV even if block raises
|
|
278
|
+
tsv2 = Persist.persist("TEST Persist TSV", :tsv) { raise "won't run" }
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
Persist.tsv (populate into a specific backend):
|
|
282
|
+
|
|
283
|
+
```ruby
|
|
284
|
+
tsv = Persist.tsv("Some TSV") do |data|
|
|
285
|
+
TSV.open("input.tsv", persist_data: data) # data is the persistence backend
|
|
286
|
+
end
|
|
287
|
+
tsv["row1"]["ValueB"] # => ["b"]
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
Open cabinets and import TSV quickly:
|
|
291
|
+
|
|
292
|
+
```ruby
|
|
293
|
+
parser = TSV::Parser.new("big.tsv", type: :double)
|
|
294
|
+
db = ScoutCabinet.open("big.hdb", true, :HDB)
|
|
295
|
+
parser.with_stream { |stream| ScoutCabinet.importtsv(db, stream) }
|
|
296
|
+
db.write_and_read do
|
|
297
|
+
TSV.setup(db, **parser.options)
|
|
298
|
+
db.extend TSVAdapter
|
|
299
|
+
end
|
|
300
|
+
db["row2"]["ValueA"] # => ["A","AA"]
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
FixWidthTable for ranges:
|
|
304
|
+
|
|
305
|
+
```ruby
|
|
306
|
+
f = Persist.open_fwt("ranges.fwt", 100, true)
|
|
307
|
+
f.add_range(range_tsv) # TSV with Start/End
|
|
308
|
+
f.read
|
|
309
|
+
f[3] # => keys overlapping position 3
|
|
310
|
+
f[3..4] # => keys overlapping range 3..4
|
|
311
|
+
f.overlaps(1) # => ["1:6"]
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
PackedIndex:
|
|
315
|
+
|
|
316
|
+
```ruby
|
|
317
|
+
pi = Persist.open_pki("packed.idx", true, %w(i i 23s f f))
|
|
318
|
+
100.times { |i| pi << [i, i+2, i.to_s*10, rand, rand] }
|
|
319
|
+
pi << nil # sparse rows
|
|
320
|
+
pi.close
|
|
321
|
+
|
|
322
|
+
pi = Persist.open_pki("packed.idx", false, %w(i i 23s f f))
|
|
323
|
+
pi[10] # => [10, 12, "1010101010", 0.x, 0.y]
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
Sharded TSV (HDB):
|
|
327
|
+
|
|
328
|
+
```ruby
|
|
329
|
+
sh = Persist.tsv("sharded", persist_options: { shard_function: ->(k){ k[-1] } }) do |data|
|
|
330
|
+
TSV.open("table.tsv", data: data, type: :list)
|
|
331
|
+
end
|
|
332
|
+
sh["id1"]["ValueA"] # => "a1"
|
|
333
|
+
sh.prefix("id1") # requires BDB engine to support range/prefix
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
Tkrzw:
|
|
337
|
+
|
|
338
|
+
```ruby
|
|
339
|
+
db = ScoutTKRZW.open("tk.tkh", true)
|
|
340
|
+
1000.times { |i| db["foo#{i}"] = "bar#{i}" }
|
|
341
|
+
db.close
|
|
342
|
+
db2 = ScoutTKRZW.open("tk.tkh", false)
|
|
343
|
+
db2.keys.length # => 1000
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
Float arrays with typed serializer:
|
|
347
|
+
|
|
348
|
+
```ruby
|
|
349
|
+
tsv = TSV.open("values.tsv", persist: true, type: :list, cast: :to_f, persist_update: true)
|
|
350
|
+
tsv.serializer # => TSVAdapter::FloatArraySerializer
|
|
351
|
+
tsv["row1"] # => [0.2, 0.3, 0.0]
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
Persist centers the framework’s caching and storage patterns: a single entry point to save/load arbitrary objects, plus first-class TSV persistence on top of multiple storage engines, with safe locking and atomicity. Use Persist.persist for general caches; use Persist.tsv or Persist.persist_tsv when dealing with TSV-shaped data and pick the engine that best matches your workload (HDB/BDB/tkh for generic KV stores, FWT for range/position indices, PKI for compact fixed-layout records, Sharder to scale by shards).
|
data/doc/Semaphore.md
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
# Semaphore (ScoutSemaphore)
|
|
2
|
+
|
|
3
|
+
ScoutSemaphore provides simple process/thread concurrency control primitives built on:
|
|
4
|
+
|
|
5
|
+
- POSIX named semaphores (via RubyInline C bindings to sem_open/sem_wait/sem_post/sem_unlink).
|
|
6
|
+
- Convenience helpers to scope a semaphore’s lifetime and run critical sections.
|
|
7
|
+
- Utilities to process collections concurrently with a bounded level of parallelism, using either processes (via TSV.traverse) or threads.
|
|
8
|
+
|
|
9
|
+
Requirements:
|
|
10
|
+
- RubyInline (gem ‘inline’) to compile the small C bindings at runtime.
|
|
11
|
+
- A platform with POSIX named semaphores (sem_open). If RubyInline is unavailable, Scout logs a warning and the module will not be functional.
|
|
12
|
+
|
|
13
|
+
Sections:
|
|
14
|
+
- Design and prerequisites
|
|
15
|
+
- API
|
|
16
|
+
- with_semaphore
|
|
17
|
+
- synchronize
|
|
18
|
+
- fork_each_on_semaphore (process-based)
|
|
19
|
+
- thread_each_on_semaphore (thread-based)
|
|
20
|
+
- Low-level C bindings (create/delete/wait/post)
|
|
21
|
+
- Usage examples
|
|
22
|
+
- Notes and caveats
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Design and prerequisites
|
|
27
|
+
|
|
28
|
+
At its core, ScoutSemaphore exposes four C-bound singleton methods using RubyInline:
|
|
29
|
+
|
|
30
|
+
- create_semaphore(name, value) — sem_open(name, O_CREAT, …, value)
|
|
31
|
+
- delete_semaphore(name) — sem_unlink(name)
|
|
32
|
+
- wait_semaphore(name) — sem_wait on the named semaphore
|
|
33
|
+
- post_semaphore(name) — sem_post on the named semaphore
|
|
34
|
+
|
|
35
|
+
On top of these, high-level Ruby methods make it easy to:
|
|
36
|
+
- Create a semaphore for the duration of a block and guarantee cleanup (with_semaphore).
|
|
37
|
+
- Run a critical section by waiting, yielding, and posting (synchronize).
|
|
38
|
+
- Traverse a list with bounded concurrency, either via background processes (fork_each_on_semaphore) or threads (thread_each_on_semaphore).
|
|
39
|
+
|
|
40
|
+
If RubyInline cannot be loaded, a warning is emitted and none of these methods are defined (tests should skip accordingly).
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## API
|
|
45
|
+
|
|
46
|
+
### with_semaphore(size, file = nil) { |sem_name| ... } → nil
|
|
47
|
+
|
|
48
|
+
Create a named semaphore with an initial count “size”, yield its name to the block, and destroy it afterwards.
|
|
49
|
+
|
|
50
|
+
- size: Integer initial tokens (maximum concurrent “holders”).
|
|
51
|
+
- file (String, optional): name/identifier for the semaphore. If nil, a unique name is generated, prefixed internally (e.g., “/scout-<digest>”). When a custom name is provided, slashes are sanitized.
|
|
52
|
+
|
|
53
|
+
Behavior:
|
|
54
|
+
- Logs creation and removal.
|
|
55
|
+
- Ensures sem_unlink on exit, even if the block raises.
|
|
56
|
+
|
|
57
|
+
Use this to scope semaphore lifetime and pass its name to other helpers or processes.
|
|
58
|
+
|
|
59
|
+
### synchronize(sem_name) { ... } → result
|
|
60
|
+
|
|
61
|
+
Wait on a named semaphore, run the critical section, then post it back.
|
|
62
|
+
|
|
63
|
+
- sem_name: the name returned by with_semaphore (or any existing semaphore name).
|
|
64
|
+
- Ensures that sem_post is called even if the block raises.
|
|
65
|
+
|
|
66
|
+
Exceptions:
|
|
67
|
+
- If sem_wait fails, a ScoutSemaphore::SemaphoreInterrupted (subclass of TryAgain) may be raised (see note below).
|
|
68
|
+
|
|
69
|
+
### fork_each_on_semaphore(elems, size, file = nil) { |elem| ... } → nil
|
|
70
|
+
|
|
71
|
+
Process a collection with at most “size” concurrent workers using TSV.traverse (process-based parallelism).
|
|
72
|
+
|
|
73
|
+
- elems: any enumerable (Array, TSV, IO, etc.; TSV.traverse-compatible).
|
|
74
|
+
- size: max concurrent workers (cpus).
|
|
75
|
+
- file: ignored for traversal (kept for API symmetry).
|
|
76
|
+
|
|
77
|
+
Behavior:
|
|
78
|
+
- Uses TSV.traverse with :cpus => size and a progress bar.
|
|
79
|
+
- Yields elem to the block in worker subprocesses.
|
|
80
|
+
- Logs and swallows Interrupts in workers.
|
|
81
|
+
|
|
82
|
+
Use this when you want process-level parallelism, isolation, and streaming integration via TSV.traverse.
|
|
83
|
+
|
|
84
|
+
### thread_each_on_semaphore(elems, size) { |elem| ... } → nil
|
|
85
|
+
|
|
86
|
+
Process a collection with at most “size” concurrent threads using a Ruby Mutex/ConditionVariable.
|
|
87
|
+
|
|
88
|
+
- elems: any enumerable.
|
|
89
|
+
- size: max concurrent threads.
|
|
90
|
+
|
|
91
|
+
Behavior:
|
|
92
|
+
- Spawns a thread per element but gates entry to the critical region so that only “size” threads run the block simultaneously.
|
|
93
|
+
- On any exception, logs and ensures remaining threads are terminated (kill).
|
|
94
|
+
|
|
95
|
+
Use this when threads are sufficient and you prefer not to fork.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## Low-level C bindings (via RubyInline)
|
|
100
|
+
|
|
101
|
+
These are provided as module singleton methods and used internally:
|
|
102
|
+
|
|
103
|
+
- ScoutSemaphore.create_semaphore(name:String, value:Integer) → void
|
|
104
|
+
- ScoutSemaphore.delete_semaphore(name:String) → void
|
|
105
|
+
- ScoutSemaphore.wait_semaphore(name:String) → Integer (0 on success; errno on error)
|
|
106
|
+
- ScoutSemaphore.post_semaphore(name:String) → void
|
|
107
|
+
|
|
108
|
+
Note:
|
|
109
|
+
- Named semaphores differ by platform. The auto-generated default names start with “/scout-…”. Custom names passed to with_semaphore are sanitized (slashes replaced), so prefer the default unless you have a specific need.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Usage examples
|
|
114
|
+
|
|
115
|
+
Basic scoping and critical section:
|
|
116
|
+
|
|
117
|
+
```ruby
|
|
118
|
+
ScoutSemaphore.with_semaphore(1) do |sem|
|
|
119
|
+
10.times do
|
|
120
|
+
ScoutSemaphore.synchronize(sem) do
|
|
121
|
+
# Only one process/thread will execute this at any time
|
|
122
|
+
do_critical_work()
|
|
123
|
+
end
|
|
124
|
+
end
|
|
125
|
+
end
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Process-based parallel map (bounded):
|
|
129
|
+
|
|
130
|
+
```ruby
|
|
131
|
+
items = (1..1000).to_a
|
|
132
|
+
ScoutSemaphore.fork_each_on_semaphore(items, 4) do |i|
|
|
133
|
+
compute(i) # up to 4 worker processes run concurrently
|
|
134
|
+
end
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Thread-based parallel map (bounded):
|
|
138
|
+
|
|
139
|
+
```ruby
|
|
140
|
+
items = (1..1000).to_a
|
|
141
|
+
ScoutSemaphore.thread_each_on_semaphore(items, 8) do |i|
|
|
142
|
+
compute_in_thread(i) # up to 8 threads run concurrently
|
|
143
|
+
end
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Coordination across processes:
|
|
147
|
+
|
|
148
|
+
```ruby
|
|
149
|
+
# In coordinator
|
|
150
|
+
ScoutSemaphore.with_semaphore(2) do |sem|
|
|
151
|
+
# Start several worker processes; pass sem to each (e.g., via ENV, argv, IPC)
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
# In each worker process
|
|
155
|
+
sem = ENV["SCOUT_SEM"]
|
|
156
|
+
ScoutSemaphore.synchronize(sem) do
|
|
157
|
+
# guarded work
|
|
158
|
+
end
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## Notes and caveats
|
|
164
|
+
|
|
165
|
+
- RubyInline dependency: If ‘inline’ cannot be loaded, ScoutSemaphore logs a warning and does not provide these methods. Install the gem or guard your code.
|
|
166
|
+
- Semaphore naming: POSIX named semaphores are typically referenced by a leading “/” path-like name. The default generated names follow this. If you pass a custom name to with_semaphore, it is sanitized (slashes replaced) before creation; prefer defaults unless you’re coordinating with external code that expects a specific name.
|
|
167
|
+
- Error reporting: wait_semaphore returns errno on failure. synchronize currently checks for a specific return to raise SemaphoreInterrupted; in practice you will see exceptions only if the system-level wait fails.
|
|
168
|
+
- fork_each_on_semaphore: this helper doesn’t use OS-level semaphores; it leverages TSV.traverse with :cpus => size (process pool). Choose this when you need the TSV/Open streaming ecosystem and process isolation.
|
|
169
|
+
- thread_each_on_semaphore: the concurrency limit is enforced with a simple counter and Mutex/ConditionVariable; it is not an OS semaphore. It ensures threads are joined/killed on error, but still prefer robust error handling in your block.
|
|
170
|
+
|
|
171
|
+
ScoutSemaphore gives you simple, robust building blocks to bound concurrency and protect critical sections in both process- and thread-based strategies, while integrating nicely with the rest of Scout’s TSV/Open streaming infrastructure.
|