scout-gear 10.8.4 → 10.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. checksums.yaml +4 -4
  2. data/.vimproject +13 -0
  3. data/README.md +352 -0
  4. data/VERSION +1 -1
  5. data/doc/Association.md +288 -0
  6. data/doc/Entity.md +296 -0
  7. data/doc/KnowledgeBase.md +433 -0
  8. data/doc/Persist.md +356 -0
  9. data/doc/Semaphore.md +171 -0
  10. data/doc/TSV.md +449 -0
  11. data/doc/WorkQueue.md +359 -0
  12. data/doc/Workflow.md +586 -0
  13. data/lib/scout/association.rb +4 -2
  14. data/lib/scout/entity/identifiers.rb +1 -1
  15. data/lib/scout/entity/object.rb +1 -1
  16. data/lib/scout/entity/property.rb +5 -5
  17. data/lib/scout/entity.rb +1 -1
  18. data/lib/scout/knowledge_base/description.rb +1 -1
  19. data/lib/scout/knowledge_base/list.rb +7 -2
  20. data/lib/scout/knowledge_base/registry.rb +2 -2
  21. data/lib/scout/knowledge_base.rb +20 -2
  22. data/lib/scout/monitor.rb +10 -6
  23. data/lib/scout/persist/engine/packed_index.rb +2 -2
  24. data/lib/scout/persist/engine/sharder.rb +1 -1
  25. data/lib/scout/persist/tsv.rb +1 -0
  26. data/lib/scout/semaphore.rb +1 -1
  27. data/lib/scout/tsv/dumper.rb +3 -3
  28. data/lib/scout/tsv/open.rb +1 -0
  29. data/lib/scout/tsv/parser.rb +1 -1
  30. data/lib/scout/tsv/transformer.rb +1 -0
  31. data/lib/scout/tsv/util.rb +2 -2
  32. data/lib/scout/work_queue/socket.rb +1 -1
  33. data/lib/scout/work_queue/worker.rb +7 -5
  34. data/lib/scout/workflow/entity.rb +22 -1
  35. data/lib/scout/workflow/step/config.rb +3 -3
  36. data/lib/scout/workflow/step/file.rb +4 -0
  37. data/lib/scout/workflow/step/info.rb +8 -2
  38. data/lib/scout/workflow/step.rb +10 -5
  39. data/lib/scout/workflow/task/inputs.rb +1 -1
  40. data/lib/scout/workflow/usage.rb +3 -2
  41. data/lib/scout/workflow/util.rb +22 -0
  42. data/scout-gear.gemspec +16 -5
  43. data/scout_commands/cat +86 -0
  44. data/scout_commands/doc +3 -1
  45. data/scout_commands/entity +151 -0
  46. data/scout_commands/system/status +238 -0
  47. data/scout_commands/workflow/info +23 -10
  48. data/scout_commands/workflow/install +1 -1
  49. data/test/scout/entity/test_property.rb +1 -1
  50. data/test/scout/knowledge_base/test_registry.rb +19 -0
  51. data/test/scout/test_work_queue.rb +1 -1
  52. data/test/scout/work_queue/test_worker.rb +12 -10
  53. metadata +15 -4
  54. data/doc/lib/scout/path.md +0 -35
  55. data/doc/lib/scout/workflow/task.md +0 -13
data/doc/Entity.md ADDED
@@ -0,0 +1,296 @@
1
+ # Entity
2
+
3
+ Entity is a lightweight system to turn plain Ruby values (strings, arrays, numerics) into annotated, behavior-rich “entities.” It layers on top of Annotation and provides:
4
+
5
+ - A module-level DSL to define “properties” (methods) for entities and arrays of entities.
6
+ - Format mapping and identifier translation between formats (via TSV indices).
7
+ - Automatic conversion of NamedArray field values into the appropriate entity type.
8
+ - Optional persistence for property results (including annotation lists) using Persist.
9
+ - Array-aware property execution with smart caching and support for multi-return computations.
10
+
11
+ Sections:
12
+ - Getting started and core concepts
13
+ - Formats and automatic conversion
14
+ - Properties: types, array semantics and persistence
15
+ - Identifier translation (Entity::Identified)
16
+ - Integration with NamedArray and TSV
17
+ - Introspection helpers
18
+ - Examples
19
+
20
+ ---
21
+
22
+ ## Getting started and core concepts
23
+
24
+ Define a new entity type by extending Entity in a module. The module becomes the “entity class” for values you annotate with it.
25
+
26
+ ```ruby
27
+ module ReversableString
28
+ extend Entity
29
+
30
+ property :reverse_text => :single do
31
+ self.reverse
32
+ end
33
+ end
34
+
35
+ s = ReversableString.setup("String1")
36
+ s.reverse_text # => "1gnirtS"
37
+ ```
38
+
39
+ Key facts:
40
+ - Extending Entity decorates the module with Annotation and Entity::Property capabilities.
41
+ - Entity.setup(value, format: ..., namespace: ...) annotates the value with this entity module (and any extra metadata).
42
+ - Entities can also be arrays: pass an array to setup to make an AnnotatedArray; properties can be defined to act on the array or per-item.
43
+
44
+ ---
45
+
46
+ ## Formats and automatic conversion
47
+
48
+ Entity supports “formats” to describe the logical identifier type of a value (e.g., “Ensembl Gene ID”, “Name”). Formats are globally mapped to entity modules using a tolerant index:
49
+
50
+ - Set formats accepted by the entity:
51
+ ```ruby
52
+ module Gene
53
+ extend Entity
54
+ self.format = ["Ensembl Gene ID", "Alias", "Name"]
55
+ end
56
+ ```
57
+
58
+ - Global registry:
59
+ - Entity.formats is a FormatIndex (case-aware, tolerant finder). It can match strings like “Transcription Factor (Ensembl Gene ID)” to “Ensembl Gene ID”.
60
+ - Entity.formats[format_name] ⇒ entity module.
61
+
62
+ Automatic conversion when reading from tables:
63
+ - NamedArray fields return values wrapped as entities if there is a matching format. See Integration with NamedArray.
64
+
65
+ Manual preparation:
66
+ - Entity.prepare_entity(value, field, options = {}) returns a value annotated with the entity for that field if a matching format is known:
67
+ ```ruby
68
+ Entity.prepare_entity("ENSG000001", "Ensembl Gene ID") # wraps into the entity registered for that format
69
+ ```
70
+
71
+ ---
72
+
73
+ ## Properties: types, array semantics and persistence
74
+
75
+ Define behaviors (methods) using the property DSL. A property can target:
76
+ - :single — defined for a single entity.
77
+ - :array — defined for an array of entities (takes the array as self).
78
+ - :multiple — batch property for arrays that computes all missing per-item results at once and returns a mapping/array; Entity handles filling per-item caches.
79
+ - :both — define a method directly that should work for both single and array (default).
80
+ - Interface adapters:
81
+ - :single2array — defined for single values, but expose an array facade.
82
+ - :array2single — defined for arrays, but expose single-return facade.
83
+
84
+ Examples:
85
+
86
+ ```ruby
87
+ module ReversableString
88
+ extend Entity
89
+
90
+ # Operates on single entity
91
+ property :reverse_text_single => :single do
92
+ self.reverse
93
+ end
94
+
95
+ # Operates on an array and returns per-item values
96
+ property :reverse_text_ary => :array do
97
+ self.collect { |s| s.reverse }
98
+ end
99
+
100
+ # Both single and array supported by a single method
101
+ property :reverse_both => :both do
102
+ if Array === self
103
+ self.collect(&:reverse)
104
+ else
105
+ self.reverse
106
+ end
107
+ end
108
+
109
+ # Batch compute for arrays (multiple)
110
+ property :multiple_annotation_list => :multiple do
111
+ # Return either an Array aligned with input indices or a Hash {item => result}
112
+ self.collect { |e| e.chars } # e.g., list of char arrays
113
+ end
114
+ end
115
+ ```
116
+
117
+ Array semantics and caching:
118
+ - When you call an array property from an element (item.reverse_text_ary), Entity uses the container’s cached result via an internal _ary_property_cache to avoid recomputing per element.
119
+ - For :multiple, Entity runs the computation once for the whole array, caches, and dispatches results to the items that requested it (even across partially overlapping arrays).
120
+
121
+ Persistence for properties:
122
+ - Mark any property as persisted to cache its result across runs/filesystems:
123
+
124
+ ```ruby
125
+ ReversableString.persist :reverse_text_single, :marshal
126
+ ReversableString.persist :reverse_text_ary, :array, dir: "/tmp/entity_cache"
127
+ ReversableString.persist :annotation_list, :annotation, annotation_repo: "/path/to/repo.tch"
128
+ ```
129
+
130
+ - persist(name, type=:marshal, options={})
131
+ - type can be any Persist serializer or special:
132
+ - :annotation or :annotations — store annotation objects via Persist.annotation_repo_persist (Tokyo Cabinet repo), with option :annotation_repo pointing to the repo path.
133
+ - :array, :marshal, etc.
134
+ - options default to:
135
+ - persist: true
136
+ - dir: Entity.entity_property_cache[self.to_s][name] (default cache under var/entity_property/<Entity>/<property>)
137
+ - persisted?(name), unpersist(name) — manage persisted registration.
138
+ - Internally, Entity::Property.persist wraps property execution inside Persist.persist (or annotation_repo_persist) and keys it by entity id.
139
+
140
+ Notes:
141
+ - Entity.ids are derived from Annotation ids (Annotation::AnnotatedObject#id).
142
+ - Persisted array returns are validated against the current array call sites to extract per-item results correctly.
143
+
144
+ ---
145
+
146
+ ## Identifier translation (Entity::Identified)
147
+
148
+ For entities that can translate between identifier formats, include Entity::Identified and register identifier sources.
149
+
150
+ Register identifier files:
151
+ - add_identifiers(file_or_tsv, default_format=nil, name_format=nil, description_format=nil)
152
+ - file can be a Path/filename (with optional NAMESPACE placeholders) or a TSV instance.
153
+ - This sets:
154
+ - identity formats on the entity (formats accepted),
155
+ - default format (:default),
156
+ - name format (:name),
157
+ - description format (not used directly in core, but available).
158
+
159
+ Namespace placeholder:
160
+ - Use “NAMESPACE” in file paths to be replaced dynamically using the entity instance’s namespace annotation.
161
+ - If your files include NAMESPACE and the value is not provided on the entity, those files are skipped with a warning.
162
+
163
+ Translate between formats:
164
+ - to(target_format) property is auto-defined for Identified entities.
165
+ - target_format can be a literal format name, :name (-> name_format), or :default.
166
+ - Works on single entities or arrays; on arrays returns an array aligned with input order.
167
+ - Example:
168
+ ```ruby
169
+ module Person
170
+ extend Entity
171
+ end
172
+ Person.add_identifiers("/data/#{Entity::Identified::NAMESPACE_TAG}/identifiers", "Name", "Alias")
173
+
174
+ miguel = Person.setup("001", format: "ID", namespace: :person)
175
+ miguel.to("Alias") # => "Miki"
176
+ miguel.to(:name) # => "Miguel"
177
+ ```
178
+
179
+ Identifier indexes:
180
+ - Entity builds and caches TSV.translation_index from identifier files via Persist.memory, keyed by [entity_type, source_format, target_format].
181
+ - Call identifier_index(target_format, source_format=nil) to get the TSV index.
182
+ - source_format defaults to the entity’s current format; if not found, Entity retries without specifying source.
183
+
184
+ Introspection:
185
+ - Entity.identifier_files(field) — class method returning the list of TSVs involved in a format for entities that include Identified.
186
+
187
+ ---
188
+
189
+ ## Integration with NamedArray and TSV
190
+
191
+ Entity values are automatically prepared when accessing NamedArray fields:
192
+
193
+ - NamedArray#[](key) is overridden to call Entity.prepare_entity(v, key), so if a field name is a recognized format (or carries it in parentheses, e.g., “Gene Name (Ensembl Gene ID)”), the returned cell value is wrapped as an entity.
194
+
195
+ Example:
196
+ ```ruby
197
+ module SomeEntity; extend Entity; self.format = "SomeEntity"; end
198
+
199
+ row = NamedArray.setup(["a", "b"], %w(SomeEntity Other))
200
+ row["SomeEntity"].respond_to?(:all_properties) # => true
201
+ ```
202
+
203
+ This makes TSV rows entity-aware when you deserialize via TSV.open; NamedArray instances become rich objects with entity behaviors available per column.
204
+
205
+ ---
206
+
207
+ ## Introspection helpers
208
+
209
+ Entity::Object adds convenience to every annotated entity:
210
+
211
+ - entity_classes — list of Entity modules applied (from Annotation).
212
+ - base_entity — the last Entity in annotation_types, i.e., the primary one.
213
+ - all_properties — list of property names available across entity modules.
214
+ - _ary_property_cache — internal cache used to memoize array property evaluations for items.
215
+
216
+ The Entity module itself exposes:
217
+ - Entity.formats — global FormatIndex of format name → entity module, with tolerant lookup (find handles strings with extra decorations).
218
+ - Entity.prepare_entity(value, field, options={}) — utility to wrap a value or array into an entity based on format mapping.
219
+
220
+ ---
221
+
222
+ ## Examples
223
+
224
+ Define a property-rich entity and use it on values and arrays:
225
+
226
+ ```ruby
227
+ module ReversableString
228
+ extend Entity
229
+
230
+ property :reverse_text_single => :single do
231
+ self.reverse
232
+ end
233
+
234
+ property :reverse_text_ary => :array do
235
+ self.collect { |s| s.reverse }
236
+ end
237
+
238
+ # Persist selected properties
239
+ persist :reverse_text_single, :marshal
240
+ persist :reverse_text_ary, :array
241
+ end
242
+
243
+ # Single
244
+ s = ReversableString.setup("String1")
245
+ s.reverse_text_single # => "1gnirtS"
246
+
247
+ # Array
248
+ arr = ReversableString.setup(["String1", "String2"])
249
+ arr.reverse_text_ary # => ["1gnirtS", "2gnirtS"]
250
+ arr[1].reverse_text_ary # uses cached array result; returns "2gnirtS"
251
+ ```
252
+
253
+ Translate identifiers:
254
+
255
+ ```ruby
256
+ module Person
257
+ extend Entity
258
+ end
259
+
260
+ # Identify formats and sources
261
+ Person.add_identifiers("/data/#{Entity::Identified::NAMESPACE_TAG}/identifiers.tsv",
262
+ "Name", "Alias")
263
+
264
+ Person.setup("001", format: "ID", namespace: :person).to("Alias") # => "Miki"
265
+ Person.setup("001", format: "ID", namespace: :person).to(:name) # => "Miguel"
266
+
267
+ list = Person.setup(["001"], format: "ID", namespace: :person)
268
+ list.to("Name") # => ["Miguel"]
269
+ ```
270
+
271
+ Automatic entity wrapping from NamedArray/TSV:
272
+
273
+ ```ruby
274
+ module Gene; extend Entity; self.format = "Ensembl Gene ID"; end
275
+
276
+ tsv = TSV.open <<~EOF
277
+ #: :sep=" " #:type=:list
278
+ #Id Ensembl Gene ID Other
279
+ row1 ENSG0001 X
280
+ EOF
281
+
282
+ row = tsv["row1"]
283
+ g = row["Ensembl Gene ID"] # => wrapped into Gene entity (if format registered)
284
+ g.all_properties # => property list for Gene
285
+ ```
286
+
287
+ ---
288
+
289
+ ## Notes and edge cases
290
+
291
+ - Entity.prepare_entity duplicates input strings/arrays to avoid mutating caller state; array duplication can be forced per call via dup_array:true.
292
+ - For arrays, properties marked :array2single or :single2array adapt their interface between collection and element call sites.
293
+ - When using identifiers with NAMESPACE placeholders, ensure you set namespace on entities (Person.setup("001", namespace: :person)) or those files will be ignored.
294
+ - Persisted annotation properties (type :annotation) use a Tokyo Cabinet repo; you can supply a repo path via annotation_repo:, or let Persist.annotation_repo_persist create/use a repo by path.
295
+
296
+ Entity turns plain values into meaningful, behavior-rich objects tailored to your domain (genes, samples, users, etc.), with robust identifier translation and scalable property evaluation/persistence built-in.