scout-gear 10.8.4 → 10.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.vimproject +13 -0
- data/README.md +352 -0
- data/VERSION +1 -1
- data/doc/Association.md +288 -0
- data/doc/Entity.md +296 -0
- data/doc/KnowledgeBase.md +433 -0
- data/doc/Persist.md +356 -0
- data/doc/Semaphore.md +171 -0
- data/doc/TSV.md +449 -0
- data/doc/WorkQueue.md +359 -0
- data/doc/Workflow.md +586 -0
- data/lib/scout/association.rb +4 -2
- data/lib/scout/entity/identifiers.rb +1 -1
- data/lib/scout/entity/object.rb +1 -1
- data/lib/scout/entity/property.rb +5 -5
- data/lib/scout/entity.rb +1 -1
- data/lib/scout/knowledge_base/description.rb +1 -1
- data/lib/scout/knowledge_base/list.rb +7 -2
- data/lib/scout/knowledge_base/registry.rb +2 -2
- data/lib/scout/knowledge_base.rb +20 -2
- data/lib/scout/monitor.rb +10 -6
- data/lib/scout/persist/engine/packed_index.rb +2 -2
- data/lib/scout/persist/engine/sharder.rb +1 -1
- data/lib/scout/persist/tsv.rb +1 -0
- data/lib/scout/semaphore.rb +1 -1
- data/lib/scout/tsv/dumper.rb +3 -3
- data/lib/scout/tsv/open.rb +1 -0
- data/lib/scout/tsv/parser.rb +1 -1
- data/lib/scout/tsv/transformer.rb +1 -0
- data/lib/scout/tsv/util.rb +2 -2
- data/lib/scout/work_queue/socket.rb +1 -1
- data/lib/scout/work_queue/worker.rb +7 -5
- data/lib/scout/workflow/entity.rb +22 -1
- data/lib/scout/workflow/step/config.rb +3 -3
- data/lib/scout/workflow/step/file.rb +4 -0
- data/lib/scout/workflow/step/info.rb +8 -2
- data/lib/scout/workflow/step.rb +10 -5
- data/lib/scout/workflow/task/inputs.rb +1 -1
- data/lib/scout/workflow/usage.rb +3 -2
- data/lib/scout/workflow/util.rb +22 -0
- data/scout-gear.gemspec +16 -5
- data/scout_commands/cat +86 -0
- data/scout_commands/doc +3 -1
- data/scout_commands/entity +151 -0
- data/scout_commands/system/status +238 -0
- data/scout_commands/workflow/info +23 -10
- data/scout_commands/workflow/install +1 -1
- data/test/scout/entity/test_property.rb +1 -1
- data/test/scout/knowledge_base/test_registry.rb +19 -0
- data/test/scout/test_work_queue.rb +1 -1
- data/test/scout/work_queue/test_worker.rb +12 -10
- metadata +15 -4
- data/doc/lib/scout/path.md +0 -35
- data/doc/lib/scout/workflow/task.md +0 -13
data/doc/Entity.md
ADDED
|
@@ -0,0 +1,296 @@
|
|
|
1
|
+
# Entity
|
|
2
|
+
|
|
3
|
+
Entity is a lightweight system to turn plain Ruby values (strings, arrays, numerics) into annotated, behavior-rich “entities.” It layers on top of Annotation and provides:
|
|
4
|
+
|
|
5
|
+
- A module-level DSL to define “properties” (methods) for entities and arrays of entities.
|
|
6
|
+
- Format mapping and identifier translation between formats (via TSV indices).
|
|
7
|
+
- Automatic conversion of NamedArray field values into the appropriate entity type.
|
|
8
|
+
- Optional persistence for property results (including annotation lists) using Persist.
|
|
9
|
+
- Array-aware property execution with smart caching and support for multi-return computations.
|
|
10
|
+
|
|
11
|
+
Sections:
|
|
12
|
+
- Getting started and core concepts
|
|
13
|
+
- Formats and automatic conversion
|
|
14
|
+
- Properties: types, array semantics and persistence
|
|
15
|
+
- Identifier translation (Entity::Identified)
|
|
16
|
+
- Integration with NamedArray and TSV
|
|
17
|
+
- Introspection helpers
|
|
18
|
+
- Examples
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Getting started and core concepts
|
|
23
|
+
|
|
24
|
+
Define a new entity type by extending Entity in a module. The module becomes the “entity class” for values you annotate with it.
|
|
25
|
+
|
|
26
|
+
```ruby
|
|
27
|
+
module ReversableString
|
|
28
|
+
extend Entity
|
|
29
|
+
|
|
30
|
+
property :reverse_text => :single do
|
|
31
|
+
self.reverse
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
s = ReversableString.setup("String1")
|
|
36
|
+
s.reverse_text # => "1gnirtS"
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Key facts:
|
|
40
|
+
- Extending Entity decorates the module with Annotation and Entity::Property capabilities.
|
|
41
|
+
- Entity.setup(value, format: ..., namespace: ...) annotates the value with this entity module (and any extra metadata).
|
|
42
|
+
- Entities can also be arrays: pass an array to setup to make an AnnotatedArray; properties can be defined to act on the array or per-item.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Formats and automatic conversion
|
|
47
|
+
|
|
48
|
+
Entity supports “formats” to describe the logical identifier type of a value (e.g., “Ensembl Gene ID”, “Name”). Formats are globally mapped to entity modules using a tolerant index:
|
|
49
|
+
|
|
50
|
+
- Set formats accepted by the entity:
|
|
51
|
+
```ruby
|
|
52
|
+
module Gene
|
|
53
|
+
extend Entity
|
|
54
|
+
self.format = ["Ensembl Gene ID", "Alias", "Name"]
|
|
55
|
+
end
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
- Global registry:
|
|
59
|
+
- Entity.formats is a FormatIndex (case-aware, tolerant finder). It can match strings like “Transcription Factor (Ensembl Gene ID)” to “Ensembl Gene ID”.
|
|
60
|
+
- Entity.formats[format_name] ⇒ entity module.
|
|
61
|
+
|
|
62
|
+
Automatic conversion when reading from tables:
|
|
63
|
+
- NamedArray fields return values wrapped as entities if there is a matching format. See Integration with NamedArray.
|
|
64
|
+
|
|
65
|
+
Manual preparation:
|
|
66
|
+
- Entity.prepare_entity(value, field, options = {}) returns a value annotated with the entity for that field if a matching format is known:
|
|
67
|
+
```ruby
|
|
68
|
+
Entity.prepare_entity("ENSG000001", "Ensembl Gene ID") # wraps into the entity registered for that format
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Properties: types, array semantics and persistence
|
|
74
|
+
|
|
75
|
+
Define behaviors (methods) using the property DSL. A property can target:
|
|
76
|
+
- :single — defined for a single entity.
|
|
77
|
+
- :array — defined for an array of entities (takes the array as self).
|
|
78
|
+
- :multiple — batch property for arrays that computes all missing per-item results at once and returns a mapping/array; Entity handles filling per-item caches.
|
|
79
|
+
- :both — define a method directly that should work for both single and array (default).
|
|
80
|
+
- Interface adapters:
|
|
81
|
+
- :single2array — defined for single values, but expose an array facade.
|
|
82
|
+
- :array2single — defined for arrays, but expose single-return facade.
|
|
83
|
+
|
|
84
|
+
Examples:
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
module ReversableString
|
|
88
|
+
extend Entity
|
|
89
|
+
|
|
90
|
+
# Operates on single entity
|
|
91
|
+
property :reverse_text_single => :single do
|
|
92
|
+
self.reverse
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
# Operates on an array and returns per-item values
|
|
96
|
+
property :reverse_text_ary => :array do
|
|
97
|
+
self.collect { |s| s.reverse }
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
# Both single and array supported by a single method
|
|
101
|
+
property :reverse_both => :both do
|
|
102
|
+
if Array === self
|
|
103
|
+
self.collect(&:reverse)
|
|
104
|
+
else
|
|
105
|
+
self.reverse
|
|
106
|
+
end
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
# Batch compute for arrays (multiple)
|
|
110
|
+
property :multiple_annotation_list => :multiple do
|
|
111
|
+
# Return either an Array aligned with input indices or a Hash {item => result}
|
|
112
|
+
self.collect { |e| e.chars } # e.g., list of char arrays
|
|
113
|
+
end
|
|
114
|
+
end
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Array semantics and caching:
|
|
118
|
+
- When you call an array property from an element (item.reverse_text_ary), Entity uses the container’s cached result via an internal _ary_property_cache to avoid recomputing per element.
|
|
119
|
+
- For :multiple, Entity runs the computation once for the whole array, caches, and dispatches results to the items that requested it (even across partially overlapping arrays).
|
|
120
|
+
|
|
121
|
+
Persistence for properties:
|
|
122
|
+
- Mark any property as persisted to cache its result across runs/filesystems:
|
|
123
|
+
|
|
124
|
+
```ruby
|
|
125
|
+
ReversableString.persist :reverse_text_single, :marshal
|
|
126
|
+
ReversableString.persist :reverse_text_ary, :array, dir: "/tmp/entity_cache"
|
|
127
|
+
ReversableString.persist :annotation_list, :annotation, annotation_repo: "/path/to/repo.tch"
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
- persist(name, type=:marshal, options={})
|
|
131
|
+
- type can be any Persist serializer or special:
|
|
132
|
+
- :annotation or :annotations — store annotation objects via Persist.annotation_repo_persist (Tokyo Cabinet repo), with option :annotation_repo pointing to the repo path.
|
|
133
|
+
- :array, :marshal, etc.
|
|
134
|
+
- options default to:
|
|
135
|
+
- persist: true
|
|
136
|
+
- dir: Entity.entity_property_cache[self.to_s][name] (default cache under var/entity_property/<Entity>/<property>)
|
|
137
|
+
- persisted?(name), unpersist(name) — manage persisted registration.
|
|
138
|
+
- Internally, Entity::Property.persist wraps property execution inside Persist.persist (or annotation_repo_persist) and keys it by entity id.
|
|
139
|
+
|
|
140
|
+
Notes:
|
|
141
|
+
- Entity.ids are derived from Annotation ids (Annotation::AnnotatedObject#id).
|
|
142
|
+
- Persisted array returns are validated against the current array call sites to extract per-item results correctly.
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Identifier translation (Entity::Identified)
|
|
147
|
+
|
|
148
|
+
For entities that can translate between identifier formats, include Entity::Identified and register identifier sources.
|
|
149
|
+
|
|
150
|
+
Register identifier files:
|
|
151
|
+
- add_identifiers(file_or_tsv, default_format=nil, name_format=nil, description_format=nil)
|
|
152
|
+
- file can be a Path/filename (with optional NAMESPACE placeholders) or a TSV instance.
|
|
153
|
+
- This sets:
|
|
154
|
+
- identity formats on the entity (formats accepted),
|
|
155
|
+
- default format (:default),
|
|
156
|
+
- name format (:name),
|
|
157
|
+
- description format (not used directly in core, but available).
|
|
158
|
+
|
|
159
|
+
Namespace placeholder:
|
|
160
|
+
- Use “NAMESPACE” in file paths to be replaced dynamically using the entity instance’s namespace annotation.
|
|
161
|
+
- If your files include NAMESPACE and the value is not provided on the entity, those files are skipped with a warning.
|
|
162
|
+
|
|
163
|
+
Translate between formats:
|
|
164
|
+
- to(target_format) property is auto-defined for Identified entities.
|
|
165
|
+
- target_format can be a literal format name, :name (-> name_format), or :default.
|
|
166
|
+
- Works on single entities or arrays; on arrays returns an array aligned with input order.
|
|
167
|
+
- Example:
|
|
168
|
+
```ruby
|
|
169
|
+
module Person
|
|
170
|
+
extend Entity
|
|
171
|
+
end
|
|
172
|
+
Person.add_identifiers("/data/#{Entity::Identified::NAMESPACE_TAG}/identifiers", "Name", "Alias")
|
|
173
|
+
|
|
174
|
+
miguel = Person.setup("001", format: "ID", namespace: :person)
|
|
175
|
+
miguel.to("Alias") # => "Miki"
|
|
176
|
+
miguel.to(:name) # => "Miguel"
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Identifier indexes:
|
|
180
|
+
- Entity builds and caches TSV.translation_index from identifier files via Persist.memory, keyed by [entity_type, source_format, target_format].
|
|
181
|
+
- Call identifier_index(target_format, source_format=nil) to get the TSV index.
|
|
182
|
+
- source_format defaults to the entity’s current format; if not found, Entity retries without specifying source.
|
|
183
|
+
|
|
184
|
+
Introspection:
|
|
185
|
+
- Entity.identifier_files(field) — class method returning the list of TSVs involved in a format for entities that include Identified.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## Integration with NamedArray and TSV
|
|
190
|
+
|
|
191
|
+
Entity values are automatically prepared when accessing NamedArray fields:
|
|
192
|
+
|
|
193
|
+
- NamedArray#[](key) is overridden to call Entity.prepare_entity(v, key), so if a field name is a recognized format (or carries it in parentheses, e.g., “Gene Name (Ensembl Gene ID)”), the returned cell value is wrapped as an entity.
|
|
194
|
+
|
|
195
|
+
Example:
|
|
196
|
+
```ruby
|
|
197
|
+
module SomeEntity; extend Entity; self.format = "SomeEntity"; end
|
|
198
|
+
|
|
199
|
+
row = NamedArray.setup(["a", "b"], %w(SomeEntity Other))
|
|
200
|
+
row["SomeEntity"].respond_to?(:all_properties) # => true
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
This makes TSV rows entity-aware when you deserialize via TSV.open; NamedArray instances become rich objects with entity behaviors available per column.
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## Introspection helpers
|
|
208
|
+
|
|
209
|
+
Entity::Object adds convenience to every annotated entity:
|
|
210
|
+
|
|
211
|
+
- entity_classes — list of Entity modules applied (from Annotation).
|
|
212
|
+
- base_entity — the last Entity in annotation_types, i.e., the primary one.
|
|
213
|
+
- all_properties — list of property names available across entity modules.
|
|
214
|
+
- _ary_property_cache — internal cache used to memoize array property evaluations for items.
|
|
215
|
+
|
|
216
|
+
The Entity module itself exposes:
|
|
217
|
+
- Entity.formats — global FormatIndex of format name → entity module, with tolerant lookup (find handles strings with extra decorations).
|
|
218
|
+
- Entity.prepare_entity(value, field, options={}) — utility to wrap a value or array into an entity based on format mapping.
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## Examples
|
|
223
|
+
|
|
224
|
+
Define a property-rich entity and use it on values and arrays:
|
|
225
|
+
|
|
226
|
+
```ruby
|
|
227
|
+
module ReversableString
|
|
228
|
+
extend Entity
|
|
229
|
+
|
|
230
|
+
property :reverse_text_single => :single do
|
|
231
|
+
self.reverse
|
|
232
|
+
end
|
|
233
|
+
|
|
234
|
+
property :reverse_text_ary => :array do
|
|
235
|
+
self.collect { |s| s.reverse }
|
|
236
|
+
end
|
|
237
|
+
|
|
238
|
+
# Persist selected properties
|
|
239
|
+
persist :reverse_text_single, :marshal
|
|
240
|
+
persist :reverse_text_ary, :array
|
|
241
|
+
end
|
|
242
|
+
|
|
243
|
+
# Single
|
|
244
|
+
s = ReversableString.setup("String1")
|
|
245
|
+
s.reverse_text_single # => "1gnirtS"
|
|
246
|
+
|
|
247
|
+
# Array
|
|
248
|
+
arr = ReversableString.setup(["String1", "String2"])
|
|
249
|
+
arr.reverse_text_ary # => ["1gnirtS", "2gnirtS"]
|
|
250
|
+
arr[1].reverse_text_ary # uses cached array result; returns "2gnirtS"
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
Translate identifiers:
|
|
254
|
+
|
|
255
|
+
```ruby
|
|
256
|
+
module Person
|
|
257
|
+
extend Entity
|
|
258
|
+
end
|
|
259
|
+
|
|
260
|
+
# Identify formats and sources
|
|
261
|
+
Person.add_identifiers("/data/#{Entity::Identified::NAMESPACE_TAG}/identifiers.tsv",
|
|
262
|
+
"Name", "Alias")
|
|
263
|
+
|
|
264
|
+
Person.setup("001", format: "ID", namespace: :person).to("Alias") # => "Miki"
|
|
265
|
+
Person.setup("001", format: "ID", namespace: :person).to(:name) # => "Miguel"
|
|
266
|
+
|
|
267
|
+
list = Person.setup(["001"], format: "ID", namespace: :person)
|
|
268
|
+
list.to("Name") # => ["Miguel"]
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
Automatic entity wrapping from NamedArray/TSV:
|
|
272
|
+
|
|
273
|
+
```ruby
|
|
274
|
+
module Gene; extend Entity; self.format = "Ensembl Gene ID"; end
|
|
275
|
+
|
|
276
|
+
tsv = TSV.open <<~EOF
|
|
277
|
+
#: :sep=" " #:type=:list
|
|
278
|
+
#Id Ensembl Gene ID Other
|
|
279
|
+
row1 ENSG0001 X
|
|
280
|
+
EOF
|
|
281
|
+
|
|
282
|
+
row = tsv["row1"]
|
|
283
|
+
g = row["Ensembl Gene ID"] # => wrapped into Gene entity (if format registered)
|
|
284
|
+
g.all_properties # => property list for Gene
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
## Notes and edge cases
|
|
290
|
+
|
|
291
|
+
- Entity.prepare_entity duplicates input strings/arrays to avoid mutating caller state; array duplication can be forced per call via dup_array:true.
|
|
292
|
+
- For arrays, properties marked :array2single or :single2array adapt their interface between collection and element call sites.
|
|
293
|
+
- When using identifiers with NAMESPACE placeholders, ensure you set namespace on entities (Person.setup("001", namespace: :person)) or those files will be ignored.
|
|
294
|
+
- Persisted annotation properties (type :annotation) use a Tokyo Cabinet repo; you can supply a repo path via annotation_repo:, or let Persist.annotation_repo_persist create/use a repo by path.
|
|
295
|
+
|
|
296
|
+
Entity turns plain values into meaningful, behavior-rich objects tailored to your domain (genes, samples, users, etc.), with robust identifier translation and scalable property evaluation/persistence built-in.
|