scout-gear 10.8.4 → 10.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/.vimproject +13 -0
  3. data/README.md +352 -0
  4. data/VERSION +1 -1
  5. data/doc/Association.md +288 -0
  6. data/doc/Entity.md +296 -0
  7. data/doc/KnowledgeBase.md +433 -0
  8. data/doc/Persist.md +356 -0
  9. data/doc/Semaphore.md +171 -0
  10. data/doc/TSV.md +449 -0
  11. data/doc/WorkQueue.md +359 -0
  12. data/doc/Workflow.md +586 -0
  13. data/lib/scout/association.rb +4 -2
  14. data/lib/scout/entity/identifiers.rb +1 -1
  15. data/lib/scout/entity/object.rb +1 -1
  16. data/lib/scout/entity/property.rb +5 -5
  17. data/lib/scout/entity.rb +1 -1
  18. data/lib/scout/knowledge_base/description.rb +1 -1
  19. data/lib/scout/knowledge_base/list.rb +7 -2
  20. data/lib/scout/knowledge_base/registry.rb +2 -2
  21. data/lib/scout/knowledge_base.rb +20 -2
  22. data/lib/scout/monitor.rb +10 -6
  23. data/lib/scout/persist/engine/packed_index.rb +2 -2
  24. data/lib/scout/persist/engine/sharder.rb +1 -1
  25. data/lib/scout/persist/tsv.rb +1 -0
  26. data/lib/scout/semaphore.rb +1 -1
  27. data/lib/scout/tsv/dumper.rb +3 -3
  28. data/lib/scout/tsv/open.rb +1 -0
  29. data/lib/scout/tsv/parser.rb +1 -1
  30. data/lib/scout/tsv/transformer.rb +1 -0
  31. data/lib/scout/tsv/util.rb +2 -2
  32. data/lib/scout/work_queue/socket.rb +1 -1
  33. data/lib/scout/work_queue/worker.rb +7 -5
  34. data/lib/scout/workflow/entity.rb +22 -1
  35. data/lib/scout/workflow/step/config.rb +3 -3
  36. data/lib/scout/workflow/step/file.rb +4 -0
  37. data/lib/scout/workflow/step/info.rb +8 -2
  38. data/lib/scout/workflow/step.rb +10 -5
  39. data/lib/scout/workflow/task/inputs.rb +1 -1
  40. data/lib/scout/workflow/usage.rb +3 -2
  41. data/lib/scout/workflow/util.rb +22 -0
  42. data/scout-gear.gemspec +16 -5
  43. data/scout_commands/cat +86 -0
  44. data/scout_commands/doc +3 -1
  45. data/scout_commands/entity +151 -0
  46. data/scout_commands/system/status +238 -0
  47. data/scout_commands/workflow/info +23 -10
  48. data/scout_commands/workflow/install +1 -1
  49. data/test/scout/entity/test_property.rb +1 -1
  50. data/test/scout/knowledge_base/test_registry.rb +19 -0
  51. data/test/scout/test_work_queue.rb +1 -1
  52. data/test/scout/work_queue/test_worker.rb +12 -10
  53. metadata +15 -4
  54. data/doc/lib/scout/path.md +0 -35
  55. data/doc/lib/scout/workflow/task.md +0 -13
@@ -0,0 +1,433 @@
1
+ # KnowledgeBase
2
+
3
+ KnowledgeBase is a thin orchestration layer around Association, TSV, Entity and Persist that lets you:
4
+
5
+ - Register and manage multiple association databases (with per-database options).
6
+ - Build and cache normalized TSV databases and pairwise “source~target” indices.
7
+ - Annotate, translate and query entities (sources/targets) consistently using configured identifier files and formats.
8
+ - Run high-level queries: full sets, subsets, children, parents, neighbours.
9
+ - Traverse paths across multiple databases using a tiny DSL with wildcards, lists and conditions.
10
+ - Manage entity lists (save/load) and generate human-readable markdown descriptions for databases.
11
+
12
+ It integrates with:
13
+ - Association (database/field normalization and index creation)
14
+ - TSV (parsing, indices)
15
+ - Entity (formats and translation)
16
+ - Persist (caching and persistence)
17
+ - SOPT CLI (scout kb …)
18
+
19
+ Sections:
20
+ - Creating and loading a knowledge base
21
+ - Registering databases
22
+ - Entity options and identifier files
23
+ - Databases and indices (open/get)
24
+ - Querying (all/subset/children/parents/neighbours)
25
+ - Lists (save/load/delete/enumerate)
26
+ - Traversal DSL
27
+ - Descriptions and markdown docs
28
+ - Enrichment
29
+ - API quick reference
30
+ - CLI: scout kb commands
31
+ - Examples
32
+
33
+ ---
34
+
35
+ ## Creating and loading a knowledge base
36
+
37
+ - Create a new KnowledgeBase pointing to a directory (will store config, indices, databases and lists under it):
38
+
39
+ ```ruby
40
+ kb = KnowledgeBase.new(Path.setup("var/kb"), "Hsa") # namespace optional
41
+ kb.save
42
+ ```
43
+
44
+ - Load a previously saved one:
45
+ ```ruby
46
+ kb = KnowledgeBase.load(:default) # or KnowledgeBase.load("/path")
47
+ ```
48
+
49
+ - Persisted attributes (saved to dir/config):
50
+ - namespace (e.g., species code “Hsa”)
51
+ - registry — mapping of database name => [file or block, options]
52
+ - entity_options — per-entity configuration (e.g., identifier TSVs; default entity parameters)
53
+ - identifier_files — knowledge-base-wide identifier files (used to build translation indices)
54
+
55
+ Save changes:
56
+ ```ruby
57
+ kb.save
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Registering databases
63
+
64
+ Use register to declare databases, associating a name with a TSV source and options:
65
+
66
+ ```ruby
67
+ kb.register :brothers, datafile_test(:person).brothers, undirected: true
68
+ kb.register :parents, datafile_test(:person).parents,
69
+ source: "=>Alias", target: "=>Name",
70
+ fields: ["Date"], entity_options: {"Person" => {language: "en"}},
71
+ identifiers: TSV.open(id_file)
72
+ ```
73
+
74
+ - file can be:
75
+ - a Path/String (TSV file), or
76
+ - a Proc block that returns a TSV Path/String/TSV (block gets stored as part of the registry).
77
+ - Common options (mirrors Association.open/index):
78
+ - :source, :target — field specifications ("Field", "Field=~Header", "Field=>Format", "Field=~Header=>Format")
79
+ - :fields — info field subset to keep (defaults to all others)
80
+ - :identifiers — TSV/Path or array, to help with format translation
81
+ - :namespace — overrides NAMESPACE placeholders in paths
82
+ - :undirected — treat database as undirected (source~target and target~source)
83
+ - :description — human-readable description string (used by kb.description)
84
+ - :entity_options — per-database overrides (merged with kb.entity_options)
85
+
86
+ Registered names are available via kb.all_databases and kb.include?(name).
87
+
88
+ ---
89
+
90
+ ## Entity options and identifier files
91
+
92
+ Entity annotation and translation are controlled by:
93
+ - kb.entity_options — e.g., { "Person" => { language: "es", identifiers: [path1,path2] } }
94
+ - kb.identifier_files — KB-wide extra identifier TSVs (supplement).
95
+
96
+ Helper:
97
+ - kb.define_entity_modules — dynamically defines Entity modules and includes Entity::Identified when entity_options include :identifiers, wiring add_identifiers and default formats.
98
+
99
+ Entity annotation/translation:
100
+ - kb.entity_options_for(type, database_name=nil) merges global entity_options for type and any per-database overrides.
101
+ - kb.annotate(values, entity_type_name, database_name=nil) wraps values in Entity with appropriate :format and options.
102
+ - kb.translate(entities, entity_type_name) converts to the KB’s configured format if needed.
103
+
104
+ Entity type discovery:
105
+ - kb.source_type(name) / kb.target_type(name) — returns Entity module for source/target formats (using Entity.formats).
106
+
107
+ ---
108
+
109
+ ## Databases and indices (open/get)
110
+
111
+ - kb.get_database(name, options = {}) → persisted TSV (Association.database) for that name:
112
+ - Builds and caches a normalized TSV (key_field = source, fields = [target, info...]).
113
+ - Respects registry options, kb.namespace, entity_options and identifiers.
114
+ - Stores under dir/<name>_<digest>.database by default.
115
+
116
+ - kb.get_index(name, options = {}) → persisted index TSV (Association.index):
117
+ - BDB-backed list TSV with keys “source~target” and values (info fields).
118
+ - Stores under dir/<name>_<digest> (and <name>_<digest>.reverse for reverse index).
119
+ - Exposes fields source_field, target_field, undirected.
120
+
121
+ Introspection:
122
+ - kb.fields(name) # => info field names of index
123
+ - kb.pair(name) # => [source_field, target_field, (optional umarker)]
124
+ - kb.source(name) / kb.target(name)
125
+ - kb.undirected(name) # => boolean
126
+
127
+ Identifier translation indices:
128
+ - kb.source_index(name) — TSV.translation_index(..., target = source(name))
129
+ - kb.target_index(name) — TSV.translation_index(..., target = target(name))
130
+ - kb.identify_source(name, value|list|:all) — translate by source index
131
+ - kb.identify_target(name, value|list|:all) — translate by target index
132
+ - kb.identify(name, entity) — try source index first, then target
133
+
134
+ ---
135
+
136
+ ## Querying (all/subset/children/parents/neighbours)
137
+
138
+ AssociationItem is used to wrap pair keys and expose properties like source_entity and target_entity.
139
+
140
+ - kb.all(name, options={}) → AssociationItem list
141
+ - Returns all pairs (index.keys), annotated with KB context.
142
+
143
+ - kb.subset(name, entities_or_options, options={}, &block) → AssociationItem list
144
+ - entities_or_options:
145
+ - :all
146
+ - AnnotatedArray (e.g., list of People) — KB infers a format key in the Hash
147
+ - Hash — e.g., { source: ["Miki"], target: :all } or { "Person" => %w(Miki Isa) }
148
+ - options:
149
+ - identify, identify_source, identify_target — translate input entities through indices
150
+ - Block: filter the resulting AssociationItem list.
151
+
152
+ - kb.children(name, entity) → AssociationItem list
153
+ - Pairs where entity appears as source. Equivalent to index.match(entity).
154
+
155
+ - kb.parents(name, entity) → AssociationItem list
156
+ - Pairs where entity appears as target (reverse.match(entity)); annotated with reverse=true.
157
+
158
+ - kb.neighbours(name, entity) → {children: ..., parents: ...} or {:children => ...} when undirected with same source/target.
159
+
160
+ Entity typing:
161
+ - All methods annotate returned IDs into Entities using kb.annotate and per-database options (so things like Person#language apply).
162
+
163
+ ---
164
+
165
+ ## Lists (save/load/delete/enumerate)
166
+
167
+ KB ships utilities for storing lists of entities or raw strings under dir/lists:
168
+
169
+ - kb.save_list(id, list)
170
+ - AnnotatedArray → saved as an Annotation.tsv; plain arrays → saved as newline-separated text (simple).
171
+ - kb.load_list(id, entity_type=nil) → AnnotatedArray or Array
172
+ - If entity_type given and typed list not found, falls back to any present.
173
+ - kb.lists → { "Person" => [ids...], "simple" => [ids...] }
174
+ - kb.delete_list(id, entity_type=nil)
175
+
176
+ List paths are resolved safely under dir/lists/<EntityType>/<id>.tsv or dir/lists/simple/<id>.
177
+
178
+ ---
179
+
180
+ ## Traversal DSL
181
+
182
+ Find paths across databases using concise rules with wildcards and conditions:
183
+
184
+ - kb.traverse(rules, nopaths=false) → [assignments, paths]
185
+ - rules: Array of strings, each a statement or assignment:
186
+
187
+ Rules syntax:
188
+ - Match rule: "<source> <db> <target> [ - <conditions> ]"
189
+ - <source>/<target> term types:
190
+ - literal entity (e.g., "Miki", "001")
191
+ - wildcard "?var" — capture assignment
192
+ - list ":list_id" — use kb.load_list(list_id)
193
+ - <db>: database name; may include “@KB” to qualify; supports wildcard components (see implementation).
194
+ - conditions: space separated tokens:
195
+ - "Field=Value" (exact match via Misc.match_value)
196
+ - "Field" (truthy)
197
+
198
+ - Assignment rule: "?var =<db> value1,value2,..."
199
+ - If <db> present, identifies values within that DB; otherwise uses the raw names.
200
+
201
+ - Accumulation block:
202
+ ```
203
+ ?var{
204
+ <rule1>
205
+ <rule2>
206
+ }
207
+ ```
208
+ - Captures results produced inside block into ?var, and resets other temp assignments at block end.
209
+
210
+ Traverse returns:
211
+ - assignments — map of "?vars" to arrays of matched IDs (translated into source/target ids as needed).
212
+ - paths — list of paths, each path is a list of AssociationItem pairs; each item has .info and Entity wrappers.
213
+
214
+ Example:
215
+ ```ruby
216
+ rules = [
217
+ "Miki brothers ?1",
218
+ "?1 parents Domingo" # find parents of Miki’s siblings who are Domingo
219
+ ]
220
+ entities, paths = kb.traverse(rules)
221
+ entities["?1"] # => ["Isa", ...] (siblings)
222
+ paths.first.first.info # => info hash for first pair
223
+ ```
224
+
225
+ ---
226
+
227
+ ## Descriptions and markdown docs
228
+
229
+ - kb.description(:db_name) → tries, in order:
230
+ - registered_options[:description]
231
+ - dir/<db>.md
232
+ - First-level README.md in kb.dir parsed to get per-database chunk (# <db name> sections)
233
+ - Source DB’s README.md (file’s dir) if kb README lacks it
234
+
235
+ - kb.markdown(:db_name) → generated markdown containing:
236
+ - Title (# DatabaseName)
237
+ - Source and target descriptions (with types)
238
+ - Undirected note if applicable
239
+ - Embedded description (if any)
240
+
241
+ ---
242
+
243
+ ## Enrichment
244
+
245
+ - kb.enrichment(db_name, entities, options={}) → runs hypergeometric enrichment using rbbt (requires rbbt/rbbt-statistics):
246
+ - Loads get_database(db_name) and converts entities via identify_source
247
+ - Calls database.enrichment(entities, database.fields.first, persist: false)
248
+
249
+ ---
250
+
251
+ ## API quick reference
252
+
253
+ Construction/persistence:
254
+ - KnowledgeBase.new(dir, namespace=nil)
255
+ - KnowledgeBase.load(dir | :default)
256
+ - kb.save / kb.load
257
+
258
+ Registry:
259
+ - kb.register(name, file=nil, options={}, &block)
260
+ - kb.include?(name) → boolean
261
+ - kb.all_databases → names
262
+ - kb.database_file(name) / kb.registered_options(name)
263
+
264
+ Entities/identifiers:
265
+ - kb.entity_options (Hash), kb.entity_options=(Hash)
266
+ - kb.identifier_files (Array), kb.identifier_files+=(Array)
267
+ - kb.define_entity_modules
268
+ - kb.annotate(values, type, database=nil) → Entity-wrapped
269
+ - kb.translate(entities, type) → convert format
270
+ - kb.source_type(name) / kb.target_type(name)
271
+
272
+ Databases/indices:
273
+ - kb.get_database(name, options={}) → TSV
274
+ - kb.get_index(name, options={}) → TSV (Association::Index)
275
+ - kb.fields(name), kb.pair(name), kb.source(name), kb.target(name), kb.undirected(name)
276
+
277
+ Identifier translation:
278
+ - kb.source_index(name), kb.target_index(name)
279
+ - kb.identify_source(name, entity), kb.identify_target(name, entity), kb.identify(name, entity)
280
+
281
+ Queries:
282
+ - kb.all(name, options={})
283
+ - kb.subset(name, entities_or_options, options={}, &block)
284
+ - kb.children(name, entity)
285
+ - kb.parents(name, entity)
286
+ - kb.neighbours(name, entity) → {:children=>..., :parents=>...} or {:children=>...}
287
+
288
+ Lists:
289
+ - kb.save_list(id, list) / kb.load_list(id, entity_type=nil)
290
+ - kb.lists → {type => [ids]} / kb.delete_list(id, entity_type=nil)
291
+
292
+ Traversal:
293
+ - kb.traverse(rules, nopaths=false) → [assignments, paths]
294
+
295
+ Documentation:
296
+ - kb.description(name) / kb.markdown(name)
297
+
298
+ Utilities:
299
+ - kb.info(name) → hash with source, target, types, entity_options, fields, undirected flag
300
+
301
+ ---
302
+
303
+ ## Command Line Interface (scout kb)
304
+
305
+ The KnowledgeBase CLI lives under scout_commands/kb and is discovered using the Path subsystem. General pattern: scout kb <subcommand> [options] ...
306
+
307
+ - Configure the knowledge base:
308
+ - scout kb config [options] <name>
309
+ - Options:
310
+ - -kb|--knowledge_base <name_or_:default> (default :default)
311
+ - -i|--identifier_files file1,file2,...
312
+ - -n|--namespace <ns>
313
+ - Saves config to kb.dir/config.
314
+
315
+ - Register a database:
316
+ - scout kb register [options] <name> <filename>
317
+ - Options:
318
+ - -kb|--knowledge_base
319
+ - -s|--source <spec>
320
+ - -t|--target <spec>
321
+ - -f|--fields field1,field2
322
+ - -n|--namespace <ns>
323
+ - -i|--identifiers <paths_or_ids>
324
+ - -u|--undirected
325
+ - -d|--description <text>
326
+ - File is resolved via Scout.identify; the registry entry is saved.
327
+
328
+ - Declare entities and set identifiers:
329
+ - scout kb entities [options] <entity> <identifier_files>
330
+ - Appends identifiers (comma-separated) to kb.entity_options[entity][:identifiers].
331
+
332
+ - Show database information:
333
+ - scout kb show [options] <name>
334
+ - Without name, lists all database names.
335
+ - With name, prints markdown summary and TSV preview (fields/key).
336
+
337
+ - Query an index:
338
+ - scout kb query [options] <name> <entity>
339
+ - Options:
340
+ - -l|--list (only print keys)
341
+ - -s|--source <spec>, -t|--target <spec>, -n|--namespace, -i|--identifiers
342
+ - entity may be:
343
+ - "X~" (prefix match on source), "~Y" (prefix match on target), "X~Y" (exact), or "X" (prefix match).
344
+ - Prints matches and per-edge info unless --list.
345
+
346
+ - Lists:
347
+ - scout kb list [options] [<list_name>]
348
+ - Without list_name, prints available lists grouped by entity type and “simple”.
349
+ - With list_name, prints the list contents.
350
+
351
+ - Traverse:
352
+ - scout kb traverse [options] <traversal>
353
+ - Options:
354
+ - -p|--paths Only list path edges and their info
355
+ - -e|--entities Only list wildcard entities
356
+ - -l|--list <var> Print the matches bound to wildcard ?<var>
357
+ - -ln|--list_name Save the printed list with a name
358
+ - traversal: comma-separated rules (see Traversal DSL).
359
+ - Output:
360
+ - entities dump (type => values)
361
+ - path edges with info (unless suppressed)
362
+ - In list mode, prints captured list and optionally saves it via save_list.
363
+
364
+ CLI discovery:
365
+ - Running “scout kb” with no subcommand lists available kb subcommands (directories under share/scout_commands/kb).
366
+ - The resolver supports nested commands and shows help if a directory is selected.
367
+
368
+ ---
369
+
370
+ ## Examples
371
+
372
+ Register and query:
373
+
374
+ ```ruby
375
+ kb = KnowledgeBase.new tmpdir
376
+ kb.register :brothers, datafile_test(:person).brothers, undirected: true
377
+ kb.register :parents, datafile_test(:person).parents
378
+
379
+ kb.all(:brothers) # => ["Miki~Isa", ...]
380
+ kb.children(:parents, "Miki") # => ["Miki~Juan", "Miki~Mariluz"]
381
+ kb.parents(:parents, "Domingo") # => ["Clei~Domingo", ...] (reverse annotated)
382
+ ```
383
+
384
+ Typed entities and per-database options:
385
+
386
+ ```ruby
387
+ kb.entity_options = { "Person" => { language: "es" } }
388
+ kb.register :parents, datafile_test(:person).parents, entity_options: { "Person" => { language: "en" } }
389
+
390
+ matches = kb.subset(:parents, target: :all, source: ["Miki"])
391
+ parents = matches.target_entity
392
+ parents.first.class # => Person (Entity)
393
+ parents.first.language # => "en" (database override applied)
394
+ ```
395
+
396
+ Save/load lists:
397
+
398
+ ```ruby
399
+ list = kb.subset(:brothers, :all).target_entity
400
+ kb.save_list("bro_and_sis", list)
401
+ kb.load_list("bro_and_sis") == list # => true
402
+ kb.lists["Person"] # => includes "bro_and_sis"
403
+ kb.delete_list("bro_and_sis")
404
+ ```
405
+
406
+ Traverse:
407
+
408
+ ```ruby
409
+ rules = [
410
+ "Miki brothers ?sib", # siblings of Miki
411
+ "?sib parents Domingo" # those with parent Domingo
412
+ ]
413
+ entities, paths = kb.traverse(rules)
414
+ entities["?sib"] # => ["Clei", ...]
415
+ paths.first.first.info # => {"Type of parent"=>"mother", "Date"=>"..."}
416
+ ```
417
+
418
+ Show descriptions:
419
+
420
+ ```ruby
421
+ kb.register :brothers, brothers_file, description: "Sibling relationships."
422
+ kb.markdown(:brothers) # => "# Brothers\n\nSource: Older ...\nTarget: Younger ...\n\nSibling relationships."
423
+ ```
424
+
425
+ Enrichment:
426
+
427
+ ```ruby
428
+ kb.enrichment(:brothers, %w(Miki Isa), persist: false)
429
+ ```
430
+
431
+ ---
432
+
433
+ KnowledgeBase glues together format-aware entity typing, TSV-backed association databases, and flexible traversal/querying, while providing a simple registry and on-disk caching under a single directory. Use it to consolidate relationship data and build rich exploration tools (CLI and programmatic) atop clean source/target semantics.