scout-gear 10.8.4 → 10.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/.vimproject +13 -0
  3. data/README.md +352 -0
  4. data/VERSION +1 -1
  5. data/doc/Association.md +288 -0
  6. data/doc/Entity.md +296 -0
  7. data/doc/KnowledgeBase.md +433 -0
  8. data/doc/Persist.md +356 -0
  9. data/doc/Semaphore.md +171 -0
  10. data/doc/TSV.md +449 -0
  11. data/doc/WorkQueue.md +359 -0
  12. data/doc/Workflow.md +586 -0
  13. data/lib/scout/association.rb +4 -2
  14. data/lib/scout/entity/identifiers.rb +1 -1
  15. data/lib/scout/entity/object.rb +1 -1
  16. data/lib/scout/entity/property.rb +5 -5
  17. data/lib/scout/entity.rb +1 -1
  18. data/lib/scout/knowledge_base/description.rb +1 -1
  19. data/lib/scout/knowledge_base/list.rb +7 -2
  20. data/lib/scout/knowledge_base/registry.rb +2 -2
  21. data/lib/scout/knowledge_base.rb +20 -2
  22. data/lib/scout/monitor.rb +10 -6
  23. data/lib/scout/persist/engine/packed_index.rb +2 -2
  24. data/lib/scout/persist/engine/sharder.rb +1 -1
  25. data/lib/scout/persist/tsv.rb +1 -0
  26. data/lib/scout/semaphore.rb +1 -1
  27. data/lib/scout/tsv/dumper.rb +3 -3
  28. data/lib/scout/tsv/open.rb +1 -0
  29. data/lib/scout/tsv/parser.rb +1 -1
  30. data/lib/scout/tsv/transformer.rb +1 -0
  31. data/lib/scout/tsv/util.rb +2 -2
  32. data/lib/scout/work_queue/socket.rb +1 -1
  33. data/lib/scout/work_queue/worker.rb +7 -5
  34. data/lib/scout/workflow/entity.rb +22 -1
  35. data/lib/scout/workflow/step/config.rb +3 -3
  36. data/lib/scout/workflow/step/file.rb +4 -0
  37. data/lib/scout/workflow/step/info.rb +8 -2
  38. data/lib/scout/workflow/step.rb +10 -5
  39. data/lib/scout/workflow/task/inputs.rb +1 -1
  40. data/lib/scout/workflow/usage.rb +3 -2
  41. data/lib/scout/workflow/util.rb +22 -0
  42. data/scout-gear.gemspec +16 -5
  43. data/scout_commands/cat +86 -0
  44. data/scout_commands/doc +3 -1
  45. data/scout_commands/entity +151 -0
  46. data/scout_commands/system/status +238 -0
  47. data/scout_commands/workflow/info +23 -10
  48. data/scout_commands/workflow/install +1 -1
  49. data/test/scout/entity/test_property.rb +1 -1
  50. data/test/scout/knowledge_base/test_registry.rb +19 -0
  51. data/test/scout/test_work_queue.rb +1 -1
  52. data/test/scout/work_queue/test_worker.rb +12 -10
  53. metadata +15 -4
  54. data/doc/lib/scout/path.md +0 -35
  55. data/doc/lib/scout/workflow/task.md +0 -13
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 97c074aa2b85744eccbcba77c7be3eaa733d7280f6c5875211a3a1b771234e04
4
- data.tar.gz: c284bf58c6954699fdc0b0742f6331ea1af7db7188b76b99325ad561e19de6cf
3
+ metadata.gz: 3900d5a0e546d494ae3224176db5feb92344dc1f6ba311461c74d02f5b137a9c
4
+ data.tar.gz: 35421139fea183944db794bad361bdfa9db78b086e13fab6367d6bbcfebc29e3
5
5
  SHA512:
6
- metadata.gz: f381c4c3167580d337ec5ac43bec9e12e8db0b4b423603fe134b8751d3a9aa8b9930797c87daa9cd0b7cf7f9c24c60ece3c00b2b4b70aec03aa71eae7bcef091
7
- data.tar.gz: f39b067a52e6724302dc92d1eeb4dac3c22ce07e25c4e986d9a3018c92dcfdd79441d23bd2c9f0dca90a7a39d6ca95bc52fab92b0b4999cc003e48767e26bce8
6
+ metadata.gz: 679d05d7ae724825e18a2ce12862d08a3e75179f575a19ff960f4f379be6f2ce844326beca6e47d4bce3f585a4e637c71164b8642a2d61a85b2a54236f345153
7
+ data.tar.gz: c952339dfaa42d948a297a7f2c456d63bb4624d0ec71ee26499b79a2a25be8f32ec8c2b5ea2bd9cc17a664c50edb8f6cf22dec458ea103eb6e82614774973bdc
data/.vimproject CHANGED
@@ -1,5 +1,15 @@
1
1
  scout-gear=/$PWD filter="*.rb *.yaml" {
2
2
  Rakefile
3
+ README.md
4
+ chats=chats filter="*"{
5
+ debug
6
+ document.rb
7
+ pipes
8
+ test_doc
9
+ doc=doc{
10
+ documenter.rb
11
+ }
12
+ }
3
13
  bin=bin filter="*"{
4
14
  scout
5
15
  }
@@ -137,7 +147,9 @@ scout-gear=/$PWD filter="*.rb *.yaml" {
137
147
  scout_commands=scout_commands filter="*"{
138
148
  rbbt
139
149
  alias
150
+ entity
140
151
  find
152
+ cat
141
153
  glob
142
154
  log
143
155
  doc
@@ -174,6 +186,7 @@ scout-gear=/$PWD filter="*.rb *.yaml" {
174
186
  }
175
187
  system=system{
176
188
  clean
189
+ status
177
190
  }
178
191
  }
179
192
  test=test {
data/README.md ADDED
@@ -0,0 +1,352 @@
1
+ # Scout Gear
2
+
3
+ Scout Gear is the core, higher-level module set of the Scout framework. It bundles rich, production-grade data and workflow tooling built on top of the lower-level primitives in scout-essentials, and adds domain abstractions such as TSV processing, workflows, knowledge bases, entity typing, parallel work queues, and more.
4
+
5
+ Layering:
6
+ - scout-essentials: foundational utilities used everywhere (Path, Open, CMD, IndiferentHash, Persist, Resource, etc.)
7
+ - scout-gear (this repo): TSV, Workflow, KnowledgeBase, Entity/Association, WorkQueue, Semaphore, and glue code
8
+ - Additional packages:
9
+ - scout-camp: remote servers, cloud deployments, web interfaces, cross-site operations
10
+ - scout-ai: model training and chat agents
11
+ - scout-rig: connect with other languages (e.g., Python)
12
+
13
+ Related ecosystem:
14
+ - Rbbt (Ruby bioinformatics): Many of Scout’s ideas and utilities originated in Rbbt. It still provides a broad set of bioinformatics workflows and tools. See the Rbbt-Workflows organization for many real-world examples and usage patterns:
15
+ - https://github.com/Rbbt-Workflows
16
+
17
+ For module-specific guides, see doc/*.md in this repository (linked below).
18
+
19
+ - TSV: doc/TSV.md
20
+ - Workflow: doc/Workflow.md
21
+ - KnowledgeBase: doc/KnowledgeBase.md
22
+ - Association: doc/Association.md
23
+ - Entity: doc/Entity.md
24
+ - WorkQueue: doc/WorkQueue.md
25
+ - Semaphore: doc/Semaphore.md
26
+
27
+ Additionally, Scout Gear reuses and exposes core facilities from scout-essentials. Summaries of those core modules are included below for convenience.
28
+
29
+ ---
30
+
31
+ ## How command-line interfaces work (scout …)
32
+
33
+ Scout provides a single “scout” command that discovers and runs nested subcommands from any installed Scout package. Scripts are discovered using the Path subsystem across PATH-like roots, enabling workflows or packages to inject their own commands.
34
+
35
+ Basics:
36
+ - The CLI resolves terms left-to-right until a file is found under a scout_commands tree.
37
+ - Example: scout workflow task runs scout_commands/workflow/task
38
+ - Example: all TSV-related scripts are under scout_commands/tsv and can be listed with scout tsv
39
+ - If the path resolves to a directory instead of a script, a list of available subcommands in that directory is shown.
40
+ - Remaining ARGV is parsed by the selected script using SimpleOPT (SOPT) or compatible parsers.
41
+ - Because discovery uses Path maps, commands contributed by other packages or installed workflows are automatically found.
42
+
43
+ See the per-module CLI sections below for TSV, Workflow, and KnowledgeBase.
44
+
45
+ ---
46
+
47
+ ## Scout Essentials: Core building blocks
48
+
49
+ Scout Gear depends on the following main modules from scout-essentials. You’ll use these directly for filesystem/resource orchestration, external command execution, caching, and options handling.
50
+
51
+ ### Path
52
+
53
+ doc/Path.md
54
+
55
+ Path is a lightweight, annotation-enabled “smart string” for composing and locating project resources across multiple search maps (current/user/global/lib/tmp, etc.). It integrates with Open and Persist.
56
+
57
+ Highlights:
58
+ - Path.setup("str") turns a String into a Path with join via [], /, or method_missing (path.foo.bar)
59
+ - Map logical locations to physical roots with path maps; find the first match across map order with path.find (and path.find_all)
60
+ - Filename helpers: get/set/replace/unset extensions; sanitize filenames; relative paths
61
+ - Directory helpers: glob and glob_all over maps; dirname/basename; realpath; newer?
62
+ - Digest summaries: path.digest_str summarizes files/dirs for logging/debugging
63
+
64
+ Usage:
65
+ ```ruby
66
+ p = Path.setup('share/data/myfile')
67
+ p.find # resolve across configured maps
68
+ p[:subdir, :file] # joins => share/data/subdir/file
69
+ ```
70
+
71
+ ### Open
72
+
73
+ doc/Open.md
74
+
75
+ Open unifies file/stream/remote I/O, atomic writes, pipes/tees/FIFOs, (bg)zip helpers, rsync/sync, and lock handling.
76
+
77
+ Highlights:
78
+ - Open.open/read/write with auto-(de)compression for .gz/.bgz/.zip and remote urls (wget/ssh)
79
+ - Streams: open_pipe, tee_stream, consume_stream, with_fifo
80
+ - Safe writes: sensible_write (tmp + atomic rename + optional locks)
81
+ - Remote: wget with caching, ssh/scp, digest_url, remote cache
82
+ - Filesystem: mkdir/mkfiledir, mv/cp/ln/link_dir, rm/rm_rf, same_file?, exists?, writable?
83
+ - Locking: Open.lock wraps a robust Lockfile (NFS-safe) with refresh/timeout/steal
84
+
85
+ Example:
86
+ ```ruby
87
+ Open.sensible_write("out.txt", Open.open("http://example.com"))
88
+ Open.with_fifo { |fifo| ... }
89
+ Open.rsync("src/", "user@server:dst/", delete: true)
90
+ ```
91
+
92
+ ### CMD
93
+
94
+ doc/CMD.md
95
+
96
+ CMD wraps Open3.popen3 with robust patterns for streaming, stderr logging, stdin feeding, auto-join of producers, and tool discovery/installation.
97
+
98
+ Highlights:
99
+ - CMD.cmd("tool args", pipe: true, in: io_or_string, stderr: Log::HIGH, autojoin: true)
100
+ - ConcurrentStream-enabled stdout with join/error propagation
101
+ - Convenience: CMD.bash("bash -l -c '...'"), cmd_pid/cmd_log
102
+ - Tool registry: CMD.tool, CMD.get_tool (auto-install via conda or producers), version scanning
103
+
104
+ Example:
105
+ ```ruby
106
+ io = CMD.cmd("cut", "-f" => 2, "-d" => " ", in: "a b", pipe: true)
107
+ io.read # => "b\n"; io.join
108
+ ```
109
+
110
+ ### IndiferentHash
111
+
112
+ doc/IndiferentHash.md
113
+
114
+ Hash mixin for indifferent access (string/symbol keys equal), deep-merge, options parsing, and string<->hash conversions.
115
+
116
+ Highlights:
117
+ - IndiferentHash.setup(hash) to extend a single hash instance
118
+ - Access with h[:a] == h["a"]; delete/include? are indifferent
119
+ - Helpers: deep_merge, values_at with indifferent keys, slice, except
120
+ - Options utilities: parse_options, process_options, positional2hash, hash2string/string2hash
121
+
122
+ Example:
123
+ ```ruby
124
+ opts = IndiferentHash.parse_options('limit=10 title="A title"')
125
+ opts[:title] # => "A title"
126
+ ```
127
+
128
+ ### Persist (core serialization/caching)
129
+
130
+ doc/Persist.md (essentials)
131
+
132
+ Typed serialization (json/yaml/marshal/binary/arrays), atomic saves, and the high-level persist pattern with locking and streaming.
133
+
134
+ Highlights:
135
+ - Persist.save/load(obj, file, type)
136
+ - Persist.persist(name, type, dir: ...) { compute_or_stream }
137
+ - Locking and tmp-to-final atomic writes
138
+ - Streaming tee: one copy to file, one to caller
139
+ - Memory cache: Persist.memory(name) { ... }
140
+ - Helpers to parse YAML/JSON/Marshal via Open
141
+
142
+ Example:
143
+ ```ruby
144
+ val = Persist.persist("expensive", :json) { compute_hash }
145
+ # subsequent calls load cached JSON unless :update or stale
146
+ ```
147
+
148
+ ### Resource
149
+
150
+ doc/Resource.md
151
+
152
+ Resource system to claim and produce files on demand (string/proc/url/rake/installers), integrated with Path/Open and locking.
153
+
154
+ Highlights:
155
+ - claim path => (:string, :proc, :url, :rake, :install)
156
+ - Produce on demand via path.produce and path.open/read
157
+ - Rake integration: drive file tasks/rules to generate outputs
158
+ - Install software into a per-resource “software” dir and update env
159
+
160
+ Example:
161
+ ```ruby
162
+ module MyPkg
163
+ extend Resource
164
+ claim root.tmp.test.hello, :string, "Hello"
165
+ end
166
+ MyPkg.tmp.test.hello.read # produces if missing, then reads
167
+ ```
168
+
169
+ Other essentials you’ll encounter:
170
+ - Annotation / AnnotatedArray / NamedArray: lightweight typed attributes on objects and arrays; named tuple-style rows
171
+ - ConcurrentStream: concurrency-aware streams with join/abort/callbacks
172
+ - SimpleOPT (SOPT): tiny CLI option DSL/parser; used by scout commands
173
+ - Log: leveled, colored logging; progress bars; fingerprint utilities
174
+ - TmpFile: temp files/dirs and stable tmp path generator for caches
175
+
176
+ ---
177
+
178
+ ## Scout Gear modules
179
+
180
+ Scout Gear builds on essentials to deliver domain abstractions and engines.
181
+
182
+ ### TSV
183
+
184
+ doc/TSV.md
185
+
186
+ A flexible, typed table abstraction with robust parser, streaming dumper/transformer, parallel traversal, joins/attachments, identifier translation, on-disk persistence (TokyoCabinet/Tkrzw), and range/position indices.
187
+
188
+ Highlights:
189
+ - Shapes: :double, :list, :flat, :single; key_field + fields
190
+ - Parse TSV/CSV from files/streams/strings with rich header options (sep, type, cast, merge)
191
+ - Dumper/Transformer for streaming pipelines
192
+ - TSV.traverse(obj, cpus: N, into: …) for parallel iteration
193
+ - Attach, change_key, change_id, translate via identifier indices
194
+ - Persistence via TSVAdapter over HDB/BDB/Tkrzw/FWT/PKI/Sharder
195
+ - Streaming paste/concat/collapse utilities; filters with persisted sets
196
+
197
+ Example:
198
+ ```ruby
199
+ tsv = TSV.open(path, persist: true, type: :double)
200
+ tsv.attach(other, complete: true)
201
+ index = TSV.index(tsv, target: "FieldA")
202
+ ```
203
+
204
+ CLI (scout tsv):
205
+ - Scripts live under scout_commands/tsv; list with scout tsv
206
+ - Run a specific subcommand: scout tsv <subcommand> [options] [args...]
207
+ - If you hit a directory, available subcommands are listed
208
+ - Subcommands parse options with SOPT (see each script’s help)
209
+
210
+ ### Workflow
211
+
212
+ doc/Workflow.md
213
+
214
+ A lightweight workflow engine. Define tasks with typed inputs and dependencies, create jobs (Steps), and run them with persistence, streaming, provenance, and orchestration under resource rules.
215
+
216
+ Highlights:
217
+ - input/dep/task DSL with helper methods; task_alias and overrides
218
+ - Jobs (Step): run/load/stream/join, info files, files_dir, provenance
219
+ - Orchestrator: schedule dependent jobs under cpus/IO constraints; retry recoverable errors; archive/erase deps per rules
220
+ - EntityWorkflow: entity-centric tasks and properties
221
+ - Queue helpers to enqueue and process jobs
222
+
223
+ Example:
224
+ ```ruby
225
+ module Baking
226
+ extend Workflow
227
+ task :say => :string do |name| "Hi #{name}" end
228
+ end
229
+
230
+ Baking.job(:say, "Miguel").run # => "Hi Miguel"
231
+ ```
232
+
233
+ CLI (scout workflow):
234
+ - List workflows: scout workflow list
235
+ - Run a task: scout workflow task <workflow> <task> [--jobname NAME] [input options...]
236
+ - Options include --fork, --nostream, --update, --printpath, --provenance, --clean, --recursive_clean, --override_deps, --deploy (serial|local|queue|SLURM|server)
237
+ - Show job info: scout workflow info <step_path> [--inputs|--recursive_inputs]
238
+ - Provenance: scout workflow prov <step_path> [--plot file.png] […]
239
+ - Trace execution: scout workflow trace <job-result> [options]
240
+ - Process queue: scout workflow process [filters] [--continuous] [--produce_cpus N] […]
241
+
242
+ You can also dispatch workflow-specific custom commands via:
243
+ - scout workflow cmd <workflow> <subcommand> … (discovers scripts under <workflow>/share/scout_commands/workflow)
244
+
245
+ ### KnowledgeBase
246
+
247
+ doc/KnowledgeBase.md
248
+
249
+ A thin orchestrator around Association, TSV, Entity, and Persist to register multiple association databases, normalize/index them, query/traverse across them, manage entity lists, and generate markdown descriptions.
250
+
251
+ Highlights:
252
+ - Register databases with source/target specs and identifier files
253
+ - get_database/get_index (BDB-backed) with undirected options
254
+ - Query: all, subset (children/parents/neighbours), identify/translate entities
255
+ - Lists: save/load/delete/enumerate typed lists
256
+ - Traversal DSL: multi-hop path finding with wildcards/conditions
257
+ - Markdown descriptions from registry/README files
258
+
259
+ Example:
260
+ ```ruby
261
+ kb = KnowledgeBase.new(Path.setup("var/kb"), "Hsa")
262
+ kb.register :brothers, datafile_test(:person).brothers, undirected: true
263
+ kb.children(:brothers, "Miki") # => ["Miki~Isa", ...]
264
+ ```
265
+
266
+ CLI (scout kb):
267
+ - Configure KB: scout kb config [options] <name>
268
+ - Register DB: scout kb register [options] <name> <filename>
269
+ - Declare entities: scout kb entities <entity> <identifier_files>
270
+ - Show info: scout kb show [<name>]
271
+ - Query: scout kb query <name> <entity_spec>
272
+ - Lists: scout kb list [<list_name>]
273
+ - Traverse: scout kb traverse [options] "<rules,comma,separated>"
274
+
275
+ ### Association
276
+
277
+ doc/Association.md
278
+
279
+ Utilities to normalize source/target field specifications from TSVs, open normalized association databases with optional identifier translation, and build pairwise “source~target” indices (optionally undirected). Also includes AssociationItem for entity-like behavior over pair strings and utilities to build incidence/adjacency matrices.
280
+
281
+ Example:
282
+ ```ruby
283
+ idx = Association.index(file, source: "=>Name", target: "Parent=>Name", undirected: true)
284
+ idx.match("Clei") # => ["Clei~Guille"]
285
+ idx.to_matrix # boolean incidence matrix
286
+ ```
287
+
288
+ ### Entity
289
+
290
+ doc/Entity.md
291
+
292
+ Annotate plain values or arrays as entities with behavior-rich “properties”, automatic format mapping, identifier translation (Entity::Identified), array-aware property batching/caching, and persistence for property results via Persist.
293
+
294
+ Example:
295
+ ```ruby
296
+ module Person
297
+ extend Entity
298
+ property :greet => :single do "Hi #{self}" end
299
+ end
300
+ Person.setup("Miki").greet
301
+ ```
302
+
303
+ ### WorkQueue
304
+
305
+ doc/WorkQueue.md
306
+
307
+ A multi-process work pipeline (forked workers + semaphore-guarded sockets) to parallelize processing over a stream of inputs, with robust error propagation.
308
+
309
+ Example:
310
+ ```ruby
311
+ q = WorkQueue.new(4){|x| x * 2}
312
+ out = []; q.process{|y| out << y}
313
+ (1..100).each{|i| q.write i}; q.close; q.join
314
+ ```
315
+
316
+ ### Semaphore (ScoutSemaphore)
317
+
318
+ doc/Semaphore.md
319
+
320
+ Concurrency helpers based on POSIX named semaphores (via RubyInline C bindings), plus higher-level helpers to bound concurrency with forks/threads.
321
+
322
+ Example:
323
+ ```ruby
324
+ ScoutSemaphore.with_semaphore(2) do |sem|
325
+ ScoutSemaphore.synchronize(sem){ critical_work }
326
+ end
327
+ ```
328
+
329
+ ---
330
+
331
+ ## Examples and further reading
332
+
333
+ - This repository’s docs directory provides in-depth guides for each module:
334
+ - TSV: doc/TSV.md
335
+ - Workflow: doc/Workflow.md
336
+ - KnowledgeBase: doc/KnowledgeBase.md
337
+ - Association: doc/Association.md
338
+ - Entity: doc/Entity.md
339
+ - WorkQueue: doc/WorkQueue.md
340
+ - Semaphore: doc/Semaphore.md
341
+ - For numerous end-to-end examples and real datasets, explore the Rbbt-Workflows organization:
342
+ - https://github.com/Rbbt-Workflows
343
+ - For foundational utilities (Path, Open, CMD, IndiferentHash, Persist, Resource, etc.), consult the scout-essentials documentation:
344
+ - Those modules are summarized above and used pervasively across Scout Gear.
345
+
346
+ ---
347
+
348
+ ## Notes
349
+
350
+ - Streaming everywhere: many APIs return ConcurrentStream-enabled IOs. Always read to EOF and join (or rely on autojoin) to ensure producers exit and errors are surfaced.
351
+ - Atomicity and locking: Open.sensible_write and Persist.persist use tmp+mv and lockfiles to provide robust cross-process behavior.
352
+ - Discovery and composition: the Path subsystem and Resource claims make it easy to build portable projects with on-demand production of resources and discoverable commands.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 10.8.4
1
+ 10.9.0
@@ -0,0 +1,288 @@
1
+ # Association
2
+
3
+ Association provides a compact toolkit to open, normalize, and index pairwise relationships from TSV-like sources. With it you can:
4
+
5
+ - Parse declarative source/target field specifications (including format remapping).
6
+ - Open an “association database” (TSV) that standardizes keys/fields and optional identifier translation via Entity/TSV indices.
7
+ - Build a fast BDB-backed index over pair “edges” using “source~target” keys, optionally undirected.
8
+ - Work with association “items” (pairs) as Entities with useful properties and conversions.
9
+ - Produce incidence/adjacency matrices and perform filtering/subsetting over pairs.
10
+
11
+ It integrates with:
12
+ - TSV (parsing, reordering, indices)
13
+ - Entity (format registry and identifier translation)
14
+ - Persist (caching/DB backends)
15
+
16
+ Sections:
17
+ - Field specification syntax and normalization
18
+ - Opening association databases
19
+ - Building and using association indices
20
+ - AssociationItem: entity properties over pairs
21
+ - Matrix utilities
22
+ - Examples
23
+
24
+ ---
25
+
26
+ ## Field specification syntax and normalization
27
+
28
+ Association accepts flexible “field specs” to declare which columns are source and target, optionally including header aliases and format conversions.
29
+
30
+ Syntax patterns (strings):
31
+
32
+ - "FieldName"
33
+ - Use the column named FieldName.
34
+ - "FieldName=~Header"
35
+ - Use field FieldName but present it as Header in outputs.
36
+ - "=~Header"
37
+ - No explicit field (infer from header or Entity format), but present as Header.
38
+ - "FieldName=>TargetFormat"
39
+ - Use FieldName and translate identifiers to TargetFormat (via TSV.translation_index / Entity identifiers).
40
+ - "FieldName=~Header=>TargetFormat"
41
+ - Full form; pick field, rename header, and convert identifiers.
42
+
43
+ Parsing and normalization helpers:
44
+ - Association.parse_field_specification(spec) -> [field, header, final_format]
45
+ - Association.normalize_specs(spec, all_fields=nil) -> normalized [field, header, format]
46
+ - If a field is not directly present but is a recognized Entity format, it tries to find a matching column within all_fields by that Entity.
47
+
48
+ Extract source/target specs:
49
+ - specs = Association.extract_specs(all_fields, options)
50
+ - options keys: :source, :target, :source_format, :target_format, :format (hash of entity_type -> default_target_format)
51
+ - Returns a Hash with:
52
+ - :source => [field, header, final_format]
53
+ - :target => [field, header, final_format]
54
+ - Infers default source/target when not provided:
55
+ - If both nil → source := key_field; target := first data field
56
+ - If source nil but target is key → source := first data field; and vice versa
57
+
58
+ Resolve headers and positions:
59
+ - Association.headers(all_fields, info_fields=nil, options)
60
+ - all_fields: [key_field, field1, ...]
61
+ - info_fields: extra value columns to keep besides target (defaults to “all” except source and target).
62
+ - Returns:
63
+ - [source_pos, field_pos, source_header, field_headers, source_format, target_format]
64
+ - Handles :format hash defaults per entity type, and honors explicit source/target formats.
65
+
66
+ ---
67
+
68
+ ## Opening association databases
69
+
70
+ Association.open coerces a TSV (file/Path/TSV) into a normalized association database with optional identifier translation.
71
+
72
+ ```ruby
73
+ db = Association.open(
74
+ file_or_tsv,
75
+ source: "Wife (ID)=>Alias",
76
+ target: "Husband (ID)=>Name",
77
+ namespace: "person", # optional; replaces NAMESPACE placeholders in paths
78
+ type: :list # optional TSV type; inferred when not set
79
+ )
80
+ ```
81
+
82
+ Behavior:
83
+ - Reads header and infers positions via headers(...).
84
+ - If target/source formats are specified:
85
+ - Builds translation indices from:
86
+ - TSV.identifier_files(file), Entity.identifier_files(format), and options[:identifiers].
87
+ - Rewrites keys/values to requested formats (e.g., “(ID)=>Name”).
88
+ - Produces a TSV with:
89
+ - key_field: resolved source field name (with “(format)” suffix if translated).
90
+ - fields: [resolved target field (with “(format)” if translated), plus remaining info_fields].
91
+ - type: inherited/passed (:double, :list, :flat, :single).
92
+
93
+ Namespace placeholder:
94
+ - When opening from a path string containing “NAMESPACE”, passing namespace: will substitute it:
95
+ - Example: ".../NAMESPACE/identifiers.tsv" -> ".../person/identifiers.tsv"
96
+
97
+ Persisted variant:
98
+ - Association.database(file, ...) wraps Association.open with Persist.tsv and a “BDB” engine:
99
+ - Returns a persistence-backed TSV (keys/fields/type saved with TSVAdapter).
100
+ - Options: any Association.open options plus :persist / persist_* (via IndiferentHash).
101
+
102
+ Examples:
103
+ - Simple open:
104
+ ```ruby
105
+ db = Association.database(datadir.person.marriages,
106
+ source: "Wife", target: "Husband", persist: true)
107
+ db["Clei"]["Husband"] # => "Miguel"
108
+ db["Clei"]["Date"] # => "2021"
109
+ ```
110
+
111
+ - Partial field + format:
112
+ ```ruby
113
+ db = Association.database(datadir.person.marriages,
114
+ source: "Wife=>Alias", target: "Husband=>Name")
115
+ ```
116
+
117
+ - Flat TSV:
118
+ ```ruby
119
+ flat = datadir.person.parents.tsv(type: :flat, fields: ["Parent"])
120
+ db = Association.database(flat)
121
+ db["Miki"] # => %w(Juan Mariluz)
122
+ ```
123
+
124
+ ---
125
+
126
+ ## Building and using association indices
127
+
128
+ Association.index materializes a BDB index over pairwise relations with keys of the form “source~target”. The index entries store the “info fields” (everything but the two endpoints) as a :list TSV.
129
+
130
+ ```ruby
131
+ idx = Association.index(file_or_tsv,
132
+ source: "=>Name",
133
+ target: "Parent=>Name",
134
+ undirected: false, # true duplicates (source~target) and (target~source)
135
+ persist: true)
136
+ ```
137
+
138
+ - Under the hood:
139
+ - Opens/normalizes the database with Association.open (or uses provided DB).
140
+ - Builds keys “[source]~[target]” and writes values (info fields) as a list.
141
+ - If undirected true (or same source/target column), writes both “[s]~[t]” and “[t]~[s]”.
142
+
143
+ - Return value:
144
+ - A BDB TSV extended with Association::Index, annotated with:
145
+ - source_field, target_field, undirected
146
+ - The index sets key_field to “SourceField~TargetField[~undirected]”.
147
+
148
+ - Methods on Association::Index:
149
+ - parse_key_field → sets source_field/target_field/undirected from key_field.
150
+ - match(entity) → returns all “source~target” keys whose source starts with entity (prefix-based).
151
+ - subset(source_list, target_spec)
152
+ - source_list: list of source entities or :all.
153
+ - target_spec: :all or list to filter by target side.
154
+ - Returns matching keys, handling undirected symmetry.
155
+ - reverse → returns a reversed index (keys swapped to “target~source”) persisted in a side file (.reverse).
156
+ - filter(value_field=nil, target_value=nil, &block)
157
+ - Without block: filter keys whose value_field is present (or equals target_value).
158
+ - With block: custom predicate over values (or key+values if value_field nil).
159
+ - to_matrix(value_field=nil) { |values| ... }
160
+ - Produces an incidence matrix TSV (rows: sources, columns: targets):
161
+ - If value_field provided, uses that column (or block mapping).
162
+ - Else boolean incidence.
163
+
164
+ Note:
165
+ - reverse persists its own DB with swapped key_field; it carries over annotations, unnamed flag, and undirected.
166
+
167
+ Example:
168
+ ```ruby
169
+ idx = Association.index(datadir.person.brothers, undirected: true)
170
+ idx.match("Clei") # => ["Clei~Guille"]
171
+ idx.reverse.match("Clei") # => ["Clei~Guille"] (same when undirected)
172
+ idx.filter("Type", "mother")
173
+ idx.subset(["Miki","Guille"], :all) # some “source~target” keys
174
+ ```
175
+
176
+ ---
177
+
178
+ ## AssociationItem: entity properties over pairs
179
+
180
+ AssociationItem is an Entity module that represents “pairs” as annotated strings “source~target”. You typically obtain such lists from index.keys, and then call properties on the annotated list.
181
+
182
+ Annotate:
183
+ - Association.index(file).keys returns raw strings; annotate them with AssociationItem.setup if needed, or use Index helpers that return annotated where applicable.
184
+
185
+ Properties (selected):
186
+ - name (single): "source~target" (returns friendly names using entity .name where available).
187
+ - full_name: database-prefixed “db:source~target” when database set.
188
+ - invert: swap endpoints (works on single or array); toggles reverse flag.
189
+ - namespace: forwarded from knowledge_base (if present).
190
+ - part (array2single): returns [source, "~", target] tuples for each pair.
191
+ - target / source (array2single): returns just target or source identifiers.
192
+ - target_type / source_type (both): resolve entity type names via knowledge_base target/source (requires a KnowledgeBase integration providing #source/#target/#undirected/#get_index/#index_fields).
193
+ - target_entity / source_entity: wrap target/source into Entity-typed values according to knowledge_base types.
194
+ - index(database=nil): resolve underlying index (delegates to knowledge_base.get_index).
195
+ - value (array2single): fetch info values for each pair from the index; returns NamedArrays.
196
+ - info_fields / info: helper for value lookups; info builds a Hash for each pair.
197
+ - tsv (array): emit a TSV for the pair list with columns: source_type, target_type, info_fields.
198
+ - filter(*args, &block): filter this pair list using the generated tsv.select.
199
+
200
+ Utilities:
201
+ - AssociationItem.incidence(pairs, key_field="Source") { |pair| optional_value }
202
+ - Returns TSV (list) with rows as sources and columns as targets; cells are blocks’ value or booleans.
203
+ - AssociationItem.adjacency(pairs, key_field="Source") { |pair| value }
204
+ - Returns TSV (double) mapping source -> [Target, values].
205
+
206
+ Convenience:
207
+ - TSV.incidence(tsv, **kwargs) delegates to Association.index(...).keys -> AssociationItem.incidence
208
+
209
+ ---
210
+
211
+ ## Matrix utilities
212
+
213
+ Given an index:
214
+ - idx.to_matrix(value_field=nil) { |values| ... } → TSV list
215
+ - value_field omitted and no block → boolean incidence.
216
+ - With value_field → use that column (vector) as the cell value.
217
+ - With block → compute cell values programmatically.
218
+
219
+ Standalone:
220
+ - AssociationItem.incidence/pairs as above.
221
+ - AssociationItem.adjacency for adjacency list.
222
+
223
+ ---
224
+
225
+ ## Examples
226
+
227
+ Parse specs:
228
+ ```ruby
229
+ Association.parse_field_specification("=~Associated Gene Name=>Ensembl Gene ID")
230
+ # => [nil, "Associated Gene Name", "Ensembl Gene ID"]
231
+
232
+ Association.normalize_specs("TG=~Associated Gene Name=>Ensembl Gene ID", %w(SG TG Effect))
233
+ # => ["TG", "Associated Gene Name", "Ensembl Gene ID"]
234
+
235
+ Association.extract_specs(%w(SG TG Effect), source: "SG", target: "TG")
236
+ # => { source: ["SG", nil, nil], target: ["TG", nil, nil] }
237
+ ```
238
+
239
+ Open database (translate to human-readable names):
240
+ ```ruby
241
+ db = Association.database(datadir.person.marriages,
242
+ source: "Wife (ID)=>Alias",
243
+ target: "Husband (ID)=>Name")
244
+ db["Clei"]["Husband"] # => "Miguel"
245
+ db["Clei"]["Date"] # => "2021"
246
+ ```
247
+
248
+ Index and match:
249
+ ```ruby
250
+ idx = Association.index(datadir.person.brothers, undirected: true)
251
+ idx.match("Clei") # => ["Clei~Guille"]
252
+ idx.subset(["Clei"], :all) # => ["Clei~Guille"]
253
+ idx.reverse.subset(["Guille"], :all) # => ["Guille~Clei"]
254
+ ```
255
+
256
+ Filter:
257
+ ```ruby
258
+ idx = Association.index(datadir.person.parents)
259
+ idx.filter('Type of parent', 'mother') # keys whose info field contains 'mother'
260
+ ```
261
+
262
+ Incidence matrix:
263
+ ```ruby
264
+ pairs = Association.index(datadir.person.brothers, undirected: true).keys
265
+ inc = AssociationItem.incidence(pairs)
266
+ inc["Clei"]["Guille"] # => true
267
+ ```
268
+
269
+ List serializer handling:
270
+ ```ruby
271
+ tsv = TSV.open <<~EOF
272
+ #: :sep=,#:type=:list
273
+ #lowcase,upcase,double,triple
274
+ a,A,aa,aaa
275
+ b,B,bb,bbb
276
+ EOF
277
+ i = Association.index(tsv)
278
+ i["a~A"] # => ['aa', 'aaa']
279
+ ```
280
+
281
+ ---
282
+
283
+ ## Notes and edge cases
284
+
285
+ - undirected default: if source_field == target_field, undirected is assumed true; else false unless set.
286
+ - When specifying formats, ensure identifier TSVs are reachable. You can pass :identifiers (TSV/Path) or rely on TSV.identifier_files(file) and Entity.identifier_files(format).
287
+ - Association.index returns a BDB-backed TSV; reverse indexing persists to a side .reverse database next to the main DB.
288
+ - Paths containing [NAMESPACE] or NAMESPACE are substituted with options[:namespace].