scout-gear 10.8.3 → 10.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.vimproject +17 -0
- data/README.md +352 -0
- data/Rakefile +1 -0
- data/VERSION +1 -1
- data/doc/Association.md +288 -0
- data/doc/Entity.md +296 -0
- data/doc/KnowledgeBase.md +433 -0
- data/doc/Persist.md +356 -0
- data/doc/Semaphore.md +171 -0
- data/doc/TSV.md +449 -0
- data/doc/WorkQueue.md +359 -0
- data/doc/Workflow.md +586 -0
- data/lib/scout/association.rb +4 -2
- data/lib/scout/entity/identifiers.rb +1 -1
- data/lib/scout/entity/object.rb +1 -1
- data/lib/scout/entity/property.rb +5 -5
- data/lib/scout/entity.rb +1 -1
- data/lib/scout/knowledge_base/description.rb +1 -1
- data/lib/scout/knowledge_base/list.rb +7 -2
- data/lib/scout/knowledge_base/registry.rb +2 -2
- data/lib/scout/knowledge_base.rb +20 -2
- data/lib/scout/monitor.rb +300 -0
- data/lib/scout/persist/engine/packed_index.rb +2 -2
- data/lib/scout/persist/engine/sharder.rb +1 -1
- data/lib/scout/persist/tsv.rb +1 -0
- data/lib/scout/semaphore.rb +1 -1
- data/lib/scout/tsv/dumper.rb +3 -3
- data/lib/scout/tsv/open.rb +1 -0
- data/lib/scout/tsv/parser.rb +1 -1
- data/lib/scout/tsv/transformer.rb +1 -0
- data/lib/scout/tsv/util.rb +2 -2
- data/lib/scout/work_queue/socket.rb +1 -1
- data/lib/scout/work_queue/worker.rb +7 -5
- data/lib/scout/workflow/documentation.rb +1 -1
- data/lib/scout/workflow/entity.rb +22 -1
- data/lib/scout/workflow/step/config.rb +3 -3
- data/lib/scout/workflow/step/file.rb +4 -0
- data/lib/scout/workflow/step/info.rb +8 -2
- data/lib/scout/workflow/step.rb +10 -5
- data/lib/scout/workflow/task/inputs.rb +1 -1
- data/lib/scout/workflow/usage.rb +3 -2
- data/lib/scout/workflow/util.rb +22 -0
- data/scout-gear.gemspec +20 -6
- data/scout_commands/cat +86 -0
- data/scout_commands/doc +3 -1
- data/scout_commands/entity +151 -0
- data/scout_commands/system/clean +146 -0
- data/scout_commands/system/status +238 -0
- data/scout_commands/workflow/info +23 -10
- data/scout_commands/workflow/install +1 -1
- data/scout_commands/workflow/task +1 -1
- data/test/scout/entity/test_property.rb +1 -1
- data/test/scout/knowledge_base/test_registry.rb +19 -0
- data/test/scout/test_work_queue.rb +1 -1
- data/test/scout/work_queue/test_worker.rb +12 -10
- metadata +32 -5
- data/doc/lib/scout/path.md +0 -35
- data/doc/lib/scout/workflow/task.md +0 -13
data/doc/Association.md
ADDED
|
@@ -0,0 +1,288 @@
|
|
|
1
|
+
# Association
|
|
2
|
+
|
|
3
|
+
Association provides a compact toolkit to open, normalize, and index pairwise relationships from TSV-like sources. With it you can:
|
|
4
|
+
|
|
5
|
+
- Parse declarative source/target field specifications (including format remapping).
|
|
6
|
+
- Open an “association database” (TSV) that standardizes keys/fields and optional identifier translation via Entity/TSV indices.
|
|
7
|
+
- Build a fast BDB-backed index over pair “edges” using “source~target” keys, optionally undirected.
|
|
8
|
+
- Work with association “items” (pairs) as Entities with useful properties and conversions.
|
|
9
|
+
- Produce incidence/adjacency matrices and perform filtering/subsetting over pairs.
|
|
10
|
+
|
|
11
|
+
It integrates with:
|
|
12
|
+
- TSV (parsing, reordering, indices)
|
|
13
|
+
- Entity (format registry and identifier translation)
|
|
14
|
+
- Persist (caching/DB backends)
|
|
15
|
+
|
|
16
|
+
Sections:
|
|
17
|
+
- Field specification syntax and normalization
|
|
18
|
+
- Opening association databases
|
|
19
|
+
- Building and using association indices
|
|
20
|
+
- AssociationItem: entity properties over pairs
|
|
21
|
+
- Matrix utilities
|
|
22
|
+
- Examples
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Field specification syntax and normalization
|
|
27
|
+
|
|
28
|
+
Association accepts flexible “field specs” to declare which columns are source and target, optionally including header aliases and format conversions.
|
|
29
|
+
|
|
30
|
+
Syntax patterns (strings):
|
|
31
|
+
|
|
32
|
+
- "FieldName"
|
|
33
|
+
- Use the column named FieldName.
|
|
34
|
+
- "FieldName=~Header"
|
|
35
|
+
- Use field FieldName but present it as Header in outputs.
|
|
36
|
+
- "=~Header"
|
|
37
|
+
- No explicit field (infer from header or Entity format), but present as Header.
|
|
38
|
+
- "FieldName=>TargetFormat"
|
|
39
|
+
- Use FieldName and translate identifiers to TargetFormat (via TSV.translation_index / Entity identifiers).
|
|
40
|
+
- "FieldName=~Header=>TargetFormat"
|
|
41
|
+
- Full form; pick field, rename header, and convert identifiers.
|
|
42
|
+
|
|
43
|
+
Parsing and normalization helpers:
|
|
44
|
+
- Association.parse_field_specification(spec) -> [field, header, final_format]
|
|
45
|
+
- Association.normalize_specs(spec, all_fields=nil) -> normalized [field, header, format]
|
|
46
|
+
- If a field is not directly present but is a recognized Entity format, it tries to find a matching column within all_fields by that Entity.
|
|
47
|
+
|
|
48
|
+
Extract source/target specs:
|
|
49
|
+
- specs = Association.extract_specs(all_fields, options)
|
|
50
|
+
- options keys: :source, :target, :source_format, :target_format, :format (hash of entity_type -> default_target_format)
|
|
51
|
+
- Returns a Hash with:
|
|
52
|
+
- :source => [field, header, final_format]
|
|
53
|
+
- :target => [field, header, final_format]
|
|
54
|
+
- Infers default source/target when not provided:
|
|
55
|
+
- If both nil → source := key_field; target := first data field
|
|
56
|
+
- If source nil but target is key → source := first data field; and vice versa
|
|
57
|
+
|
|
58
|
+
Resolve headers and positions:
|
|
59
|
+
- Association.headers(all_fields, info_fields=nil, options)
|
|
60
|
+
- all_fields: [key_field, field1, ...]
|
|
61
|
+
- info_fields: extra value columns to keep besides target (defaults to “all” except source and target).
|
|
62
|
+
- Returns:
|
|
63
|
+
- [source_pos, field_pos, source_header, field_headers, source_format, target_format]
|
|
64
|
+
- Handles :format hash defaults per entity type, and honors explicit source/target formats.
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## Opening association databases
|
|
69
|
+
|
|
70
|
+
Association.open coerces a TSV (file/Path/TSV) into a normalized association database with optional identifier translation.
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
db = Association.open(
|
|
74
|
+
file_or_tsv,
|
|
75
|
+
source: "Wife (ID)=>Alias",
|
|
76
|
+
target: "Husband (ID)=>Name",
|
|
77
|
+
namespace: "person", # optional; replaces NAMESPACE placeholders in paths
|
|
78
|
+
type: :list # optional TSV type; inferred when not set
|
|
79
|
+
)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Behavior:
|
|
83
|
+
- Reads header and infers positions via headers(...).
|
|
84
|
+
- If target/source formats are specified:
|
|
85
|
+
- Builds translation indices from:
|
|
86
|
+
- TSV.identifier_files(file), Entity.identifier_files(format), and options[:identifiers].
|
|
87
|
+
- Rewrites keys/values to requested formats (e.g., “(ID)=>Name”).
|
|
88
|
+
- Produces a TSV with:
|
|
89
|
+
- key_field: resolved source field name (with “(format)” suffix if translated).
|
|
90
|
+
- fields: [resolved target field (with “(format)” if translated), plus remaining info_fields].
|
|
91
|
+
- type: inherited/passed (:double, :list, :flat, :single).
|
|
92
|
+
|
|
93
|
+
Namespace placeholder:
|
|
94
|
+
- When opening from a path string containing “NAMESPACE”, passing namespace: will substitute it:
|
|
95
|
+
- Example: ".../NAMESPACE/identifiers.tsv" -> ".../person/identifiers.tsv"
|
|
96
|
+
|
|
97
|
+
Persisted variant:
|
|
98
|
+
- Association.database(file, ...) wraps Association.open with Persist.tsv and a “BDB” engine:
|
|
99
|
+
- Returns a persistence-backed TSV (keys/fields/type saved with TSVAdapter).
|
|
100
|
+
- Options: any Association.open options plus :persist / persist_* (via IndiferentHash).
|
|
101
|
+
|
|
102
|
+
Examples:
|
|
103
|
+
- Simple open:
|
|
104
|
+
```ruby
|
|
105
|
+
db = Association.database(datadir.person.marriages,
|
|
106
|
+
source: "Wife", target: "Husband", persist: true)
|
|
107
|
+
db["Clei"]["Husband"] # => "Miguel"
|
|
108
|
+
db["Clei"]["Date"] # => "2021"
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
- Partial field + format:
|
|
112
|
+
```ruby
|
|
113
|
+
db = Association.database(datadir.person.marriages,
|
|
114
|
+
source: "Wife=>Alias", target: "Husband=>Name")
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
- Flat TSV:
|
|
118
|
+
```ruby
|
|
119
|
+
flat = datadir.person.parents.tsv(type: :flat, fields: ["Parent"])
|
|
120
|
+
db = Association.database(flat)
|
|
121
|
+
db["Miki"] # => %w(Juan Mariluz)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Building and using association indices
|
|
127
|
+
|
|
128
|
+
Association.index materializes a BDB index over pairwise relations with keys of the form “source~target”. The index entries store the “info fields” (everything but the two endpoints) as a :list TSV.
|
|
129
|
+
|
|
130
|
+
```ruby
|
|
131
|
+
idx = Association.index(file_or_tsv,
|
|
132
|
+
source: "=>Name",
|
|
133
|
+
target: "Parent=>Name",
|
|
134
|
+
undirected: false, # true duplicates (source~target) and (target~source)
|
|
135
|
+
persist: true)
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
- Under the hood:
|
|
139
|
+
- Opens/normalizes the database with Association.open (or uses provided DB).
|
|
140
|
+
- Builds keys “[source]~[target]” and writes values (info fields) as a list.
|
|
141
|
+
- If undirected true (or same source/target column), writes both “[s]~[t]” and “[t]~[s]”.
|
|
142
|
+
|
|
143
|
+
- Return value:
|
|
144
|
+
- A BDB TSV extended with Association::Index, annotated with:
|
|
145
|
+
- source_field, target_field, undirected
|
|
146
|
+
- The index sets key_field to “SourceField~TargetField[~undirected]”.
|
|
147
|
+
|
|
148
|
+
- Methods on Association::Index:
|
|
149
|
+
- parse_key_field → sets source_field/target_field/undirected from key_field.
|
|
150
|
+
- match(entity) → returns all “source~target” keys whose source starts with entity (prefix-based).
|
|
151
|
+
- subset(source_list, target_spec)
|
|
152
|
+
- source_list: list of source entities or :all.
|
|
153
|
+
- target_spec: :all or list to filter by target side.
|
|
154
|
+
- Returns matching keys, handling undirected symmetry.
|
|
155
|
+
- reverse → returns a reversed index (keys swapped to “target~source”) persisted in a side file (.reverse).
|
|
156
|
+
- filter(value_field=nil, target_value=nil, &block)
|
|
157
|
+
- Without block: filter keys whose value_field is present (or equals target_value).
|
|
158
|
+
- With block: custom predicate over values (or key+values if value_field nil).
|
|
159
|
+
- to_matrix(value_field=nil) { |values| ... }
|
|
160
|
+
- Produces an incidence matrix TSV (rows: sources, columns: targets):
|
|
161
|
+
- If value_field provided, uses that column (or block mapping).
|
|
162
|
+
- Else boolean incidence.
|
|
163
|
+
|
|
164
|
+
Note:
|
|
165
|
+
- reverse persists its own DB with swapped key_field; it carries over annotations, unnamed flag, and undirected.
|
|
166
|
+
|
|
167
|
+
Example:
|
|
168
|
+
```ruby
|
|
169
|
+
idx = Association.index(datadir.person.brothers, undirected: true)
|
|
170
|
+
idx.match("Clei") # => ["Clei~Guille"]
|
|
171
|
+
idx.reverse.match("Clei") # => ["Clei~Guille"] (same when undirected)
|
|
172
|
+
idx.filter("Type", "mother")
|
|
173
|
+
idx.subset(["Miki","Guille"], :all) # some “source~target” keys
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## AssociationItem: entity properties over pairs
|
|
179
|
+
|
|
180
|
+
AssociationItem is an Entity module that represents “pairs” as annotated strings “source~target”. You typically obtain such lists from index.keys, and then call properties on the annotated list.
|
|
181
|
+
|
|
182
|
+
Annotate:
|
|
183
|
+
- Association.index(file).keys returns raw strings; annotate them with AssociationItem.setup if needed, or use Index helpers that return annotated where applicable.
|
|
184
|
+
|
|
185
|
+
Properties (selected):
|
|
186
|
+
- name (single): "source~target" (returns friendly names using entity .name where available).
|
|
187
|
+
- full_name: database-prefixed “db:source~target” when database set.
|
|
188
|
+
- invert: swap endpoints (works on single or array); toggles reverse flag.
|
|
189
|
+
- namespace: forwarded from knowledge_base (if present).
|
|
190
|
+
- part (array2single): returns [source, "~", target] tuples for each pair.
|
|
191
|
+
- target / source (array2single): returns just target or source identifiers.
|
|
192
|
+
- target_type / source_type (both): resolve entity type names via knowledge_base target/source (requires a KnowledgeBase integration providing #source/#target/#undirected/#get_index/#index_fields).
|
|
193
|
+
- target_entity / source_entity: wrap target/source into Entity-typed values according to knowledge_base types.
|
|
194
|
+
- index(database=nil): resolve underlying index (delegates to knowledge_base.get_index).
|
|
195
|
+
- value (array2single): fetch info values for each pair from the index; returns NamedArrays.
|
|
196
|
+
- info_fields / info: helper for value lookups; info builds a Hash for each pair.
|
|
197
|
+
- tsv (array): emit a TSV for the pair list with columns: source_type, target_type, info_fields.
|
|
198
|
+
- filter(*args, &block): filter this pair list using the generated tsv.select.
|
|
199
|
+
|
|
200
|
+
Utilities:
|
|
201
|
+
- AssociationItem.incidence(pairs, key_field="Source") { |pair| optional_value }
|
|
202
|
+
- Returns TSV (list) with rows as sources and columns as targets; cells are blocks’ value or booleans.
|
|
203
|
+
- AssociationItem.adjacency(pairs, key_field="Source") { |pair| value }
|
|
204
|
+
- Returns TSV (double) mapping source -> [Target, values].
|
|
205
|
+
|
|
206
|
+
Convenience:
|
|
207
|
+
- TSV.incidence(tsv, **kwargs) delegates to Association.index(...).keys -> AssociationItem.incidence
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Matrix utilities
|
|
212
|
+
|
|
213
|
+
Given an index:
|
|
214
|
+
- idx.to_matrix(value_field=nil) { |values| ... } → TSV list
|
|
215
|
+
- value_field omitted and no block → boolean incidence.
|
|
216
|
+
- With value_field → use that column (vector) as the cell value.
|
|
217
|
+
- With block → compute cell values programmatically.
|
|
218
|
+
|
|
219
|
+
Standalone:
|
|
220
|
+
- AssociationItem.incidence/pairs as above.
|
|
221
|
+
- AssociationItem.adjacency for adjacency list.
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## Examples
|
|
226
|
+
|
|
227
|
+
Parse specs:
|
|
228
|
+
```ruby
|
|
229
|
+
Association.parse_field_specification("=~Associated Gene Name=>Ensembl Gene ID")
|
|
230
|
+
# => [nil, "Associated Gene Name", "Ensembl Gene ID"]
|
|
231
|
+
|
|
232
|
+
Association.normalize_specs("TG=~Associated Gene Name=>Ensembl Gene ID", %w(SG TG Effect))
|
|
233
|
+
# => ["TG", "Associated Gene Name", "Ensembl Gene ID"]
|
|
234
|
+
|
|
235
|
+
Association.extract_specs(%w(SG TG Effect), source: "SG", target: "TG")
|
|
236
|
+
# => { source: ["SG", nil, nil], target: ["TG", nil, nil] }
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Open database (translate to human-readable names):
|
|
240
|
+
```ruby
|
|
241
|
+
db = Association.database(datadir.person.marriages,
|
|
242
|
+
source: "Wife (ID)=>Alias",
|
|
243
|
+
target: "Husband (ID)=>Name")
|
|
244
|
+
db["Clei"]["Husband"] # => "Miguel"
|
|
245
|
+
db["Clei"]["Date"] # => "2021"
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
Index and match:
|
|
249
|
+
```ruby
|
|
250
|
+
idx = Association.index(datadir.person.brothers, undirected: true)
|
|
251
|
+
idx.match("Clei") # => ["Clei~Guille"]
|
|
252
|
+
idx.subset(["Clei"], :all) # => ["Clei~Guille"]
|
|
253
|
+
idx.reverse.subset(["Guille"], :all) # => ["Guille~Clei"]
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
Filter:
|
|
257
|
+
```ruby
|
|
258
|
+
idx = Association.index(datadir.person.parents)
|
|
259
|
+
idx.filter('Type of parent', 'mother') # keys whose info field contains 'mother'
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Incidence matrix:
|
|
263
|
+
```ruby
|
|
264
|
+
pairs = Association.index(datadir.person.brothers, undirected: true).keys
|
|
265
|
+
inc = AssociationItem.incidence(pairs)
|
|
266
|
+
inc["Clei"]["Guille"] # => true
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
List serializer handling:
|
|
270
|
+
```ruby
|
|
271
|
+
tsv = TSV.open <<~EOF
|
|
272
|
+
#: :sep=,#:type=:list
|
|
273
|
+
#lowcase,upcase,double,triple
|
|
274
|
+
a,A,aa,aaa
|
|
275
|
+
b,B,bb,bbb
|
|
276
|
+
EOF
|
|
277
|
+
i = Association.index(tsv)
|
|
278
|
+
i["a~A"] # => ['aa', 'aaa']
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## Notes and edge cases
|
|
284
|
+
|
|
285
|
+
- undirected default: if source_field == target_field, undirected is assumed true; else false unless set.
|
|
286
|
+
- When specifying formats, ensure identifier TSVs are reachable. You can pass :identifiers (TSV/Path) or rely on TSV.identifier_files(file) and Entity.identifier_files(format).
|
|
287
|
+
- Association.index returns a BDB-backed TSV; reverse indexing persists to a side .reverse database next to the main DB.
|
|
288
|
+
- Paths containing [NAMESPACE] or NAMESPACE are substituted with options[:namespace].
|
data/doc/Entity.md
ADDED
|
@@ -0,0 +1,296 @@
|
|
|
1
|
+
# Entity
|
|
2
|
+
|
|
3
|
+
Entity is a lightweight system to turn plain Ruby values (strings, arrays, numerics) into annotated, behavior-rich “entities.” It layers on top of Annotation and provides:
|
|
4
|
+
|
|
5
|
+
- A module-level DSL to define “properties” (methods) for entities and arrays of entities.
|
|
6
|
+
- Format mapping and identifier translation between formats (via TSV indices).
|
|
7
|
+
- Automatic conversion of NamedArray field values into the appropriate entity type.
|
|
8
|
+
- Optional persistence for property results (including annotation lists) using Persist.
|
|
9
|
+
- Array-aware property execution with smart caching and support for multi-return computations.
|
|
10
|
+
|
|
11
|
+
Sections:
|
|
12
|
+
- Getting started and core concepts
|
|
13
|
+
- Formats and automatic conversion
|
|
14
|
+
- Properties: types, array semantics and persistence
|
|
15
|
+
- Identifier translation (Entity::Identified)
|
|
16
|
+
- Integration with NamedArray and TSV
|
|
17
|
+
- Introspection helpers
|
|
18
|
+
- Examples
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Getting started and core concepts
|
|
23
|
+
|
|
24
|
+
Define a new entity type by extending Entity in a module. The module becomes the “entity class” for values you annotate with it.
|
|
25
|
+
|
|
26
|
+
```ruby
|
|
27
|
+
module ReversableString
|
|
28
|
+
extend Entity
|
|
29
|
+
|
|
30
|
+
property :reverse_text => :single do
|
|
31
|
+
self.reverse
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
s = ReversableString.setup("String1")
|
|
36
|
+
s.reverse_text # => "1gnirtS"
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Key facts:
|
|
40
|
+
- Extending Entity decorates the module with Annotation and Entity::Property capabilities.
|
|
41
|
+
- Entity.setup(value, format: ..., namespace: ...) annotates the value with this entity module (and any extra metadata).
|
|
42
|
+
- Entities can also be arrays: pass an array to setup to make an AnnotatedArray; properties can be defined to act on the array or per-item.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Formats and automatic conversion
|
|
47
|
+
|
|
48
|
+
Entity supports “formats” to describe the logical identifier type of a value (e.g., “Ensembl Gene ID”, “Name”). Formats are globally mapped to entity modules using a tolerant index:
|
|
49
|
+
|
|
50
|
+
- Set formats accepted by the entity:
|
|
51
|
+
```ruby
|
|
52
|
+
module Gene
|
|
53
|
+
extend Entity
|
|
54
|
+
self.format = ["Ensembl Gene ID", "Alias", "Name"]
|
|
55
|
+
end
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
- Global registry:
|
|
59
|
+
- Entity.formats is a FormatIndex (case-aware, tolerant finder). It can match strings like “Transcription Factor (Ensembl Gene ID)” to “Ensembl Gene ID”.
|
|
60
|
+
- Entity.formats[format_name] ⇒ entity module.
|
|
61
|
+
|
|
62
|
+
Automatic conversion when reading from tables:
|
|
63
|
+
- NamedArray fields return values wrapped as entities if there is a matching format. See Integration with NamedArray.
|
|
64
|
+
|
|
65
|
+
Manual preparation:
|
|
66
|
+
- Entity.prepare_entity(value, field, options = {}) returns a value annotated with the entity for that field if a matching format is known:
|
|
67
|
+
```ruby
|
|
68
|
+
Entity.prepare_entity("ENSG000001", "Ensembl Gene ID") # wraps into the entity registered for that format
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Properties: types, array semantics and persistence
|
|
74
|
+
|
|
75
|
+
Define behaviors (methods) using the property DSL. A property can target:
|
|
76
|
+
- :single — defined for a single entity.
|
|
77
|
+
- :array — defined for an array of entities (takes the array as self).
|
|
78
|
+
- :multiple — batch property for arrays that computes all missing per-item results at once and returns a mapping/array; Entity handles filling per-item caches.
|
|
79
|
+
- :both — define a method directly that should work for both single and array (default).
|
|
80
|
+
- Interface adapters:
|
|
81
|
+
- :single2array — defined for single values, but expose an array facade.
|
|
82
|
+
- :array2single — defined for arrays, but expose single-return facade.
|
|
83
|
+
|
|
84
|
+
Examples:
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
module ReversableString
|
|
88
|
+
extend Entity
|
|
89
|
+
|
|
90
|
+
# Operates on single entity
|
|
91
|
+
property :reverse_text_single => :single do
|
|
92
|
+
self.reverse
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
# Operates on an array and returns per-item values
|
|
96
|
+
property :reverse_text_ary => :array do
|
|
97
|
+
self.collect { |s| s.reverse }
|
|
98
|
+
end
|
|
99
|
+
|
|
100
|
+
# Both single and array supported by a single method
|
|
101
|
+
property :reverse_both => :both do
|
|
102
|
+
if Array === self
|
|
103
|
+
self.collect(&:reverse)
|
|
104
|
+
else
|
|
105
|
+
self.reverse
|
|
106
|
+
end
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
# Batch compute for arrays (multiple)
|
|
110
|
+
property :multiple_annotation_list => :multiple do
|
|
111
|
+
# Return either an Array aligned with input indices or a Hash {item => result}
|
|
112
|
+
self.collect { |e| e.chars } # e.g., list of char arrays
|
|
113
|
+
end
|
|
114
|
+
end
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Array semantics and caching:
|
|
118
|
+
- When you call an array property from an element (item.reverse_text_ary), Entity uses the container’s cached result via an internal _ary_property_cache to avoid recomputing per element.
|
|
119
|
+
- For :multiple, Entity runs the computation once for the whole array, caches, and dispatches results to the items that requested it (even across partially overlapping arrays).
|
|
120
|
+
|
|
121
|
+
Persistence for properties:
|
|
122
|
+
- Mark any property as persisted to cache its result across runs/filesystems:
|
|
123
|
+
|
|
124
|
+
```ruby
|
|
125
|
+
ReversableString.persist :reverse_text_single, :marshal
|
|
126
|
+
ReversableString.persist :reverse_text_ary, :array, dir: "/tmp/entity_cache"
|
|
127
|
+
ReversableString.persist :annotation_list, :annotation, annotation_repo: "/path/to/repo.tch"
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
- persist(name, type=:marshal, options={})
|
|
131
|
+
- type can be any Persist serializer or special:
|
|
132
|
+
- :annotation or :annotations — store annotation objects via Persist.annotation_repo_persist (Tokyo Cabinet repo), with option :annotation_repo pointing to the repo path.
|
|
133
|
+
- :array, :marshal, etc.
|
|
134
|
+
- options default to:
|
|
135
|
+
- persist: true
|
|
136
|
+
- dir: Entity.entity_property_cache[self.to_s][name] (default cache under var/entity_property/<Entity>/<property>)
|
|
137
|
+
- persisted?(name), unpersist(name) — manage persisted registration.
|
|
138
|
+
- Internally, Entity::Property.persist wraps property execution inside Persist.persist (or annotation_repo_persist) and keys it by entity id.
|
|
139
|
+
|
|
140
|
+
Notes:
|
|
141
|
+
- Entity.ids are derived from Annotation ids (Annotation::AnnotatedObject#id).
|
|
142
|
+
- Persisted array returns are validated against the current array call sites to extract per-item results correctly.
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Identifier translation (Entity::Identified)
|
|
147
|
+
|
|
148
|
+
For entities that can translate between identifier formats, include Entity::Identified and register identifier sources.
|
|
149
|
+
|
|
150
|
+
Register identifier files:
|
|
151
|
+
- add_identifiers(file_or_tsv, default_format=nil, name_format=nil, description_format=nil)
|
|
152
|
+
- file can be a Path/filename (with optional NAMESPACE placeholders) or a TSV instance.
|
|
153
|
+
- This sets:
|
|
154
|
+
- identity formats on the entity (formats accepted),
|
|
155
|
+
- default format (:default),
|
|
156
|
+
- name format (:name),
|
|
157
|
+
- description format (not used directly in core, but available).
|
|
158
|
+
|
|
159
|
+
Namespace placeholder:
|
|
160
|
+
- Use “NAMESPACE” in file paths to be replaced dynamically using the entity instance’s namespace annotation.
|
|
161
|
+
- If your files include NAMESPACE and the value is not provided on the entity, those files are skipped with a warning.
|
|
162
|
+
|
|
163
|
+
Translate between formats:
|
|
164
|
+
- to(target_format) property is auto-defined for Identified entities.
|
|
165
|
+
- target_format can be a literal format name, :name (-> name_format), or :default.
|
|
166
|
+
- Works on single entities or arrays; on arrays returns an array aligned with input order.
|
|
167
|
+
- Example:
|
|
168
|
+
```ruby
|
|
169
|
+
module Person
|
|
170
|
+
extend Entity
|
|
171
|
+
end
|
|
172
|
+
Person.add_identifiers("/data/#{Entity::Identified::NAMESPACE_TAG}/identifiers", "Name", "Alias")
|
|
173
|
+
|
|
174
|
+
miguel = Person.setup("001", format: "ID", namespace: :person)
|
|
175
|
+
miguel.to("Alias") # => "Miki"
|
|
176
|
+
miguel.to(:name) # => "Miguel"
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Identifier indexes:
|
|
180
|
+
- Entity builds and caches TSV.translation_index from identifier files via Persist.memory, keyed by [entity_type, source_format, target_format].
|
|
181
|
+
- Call identifier_index(target_format, source_format=nil) to get the TSV index.
|
|
182
|
+
- source_format defaults to the entity’s current format; if not found, Entity retries without specifying source.
|
|
183
|
+
|
|
184
|
+
Introspection:
|
|
185
|
+
- Entity.identifier_files(field) — class method returning the list of TSVs involved in a format for entities that include Identified.
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## Integration with NamedArray and TSV
|
|
190
|
+
|
|
191
|
+
Entity values are automatically prepared when accessing NamedArray fields:
|
|
192
|
+
|
|
193
|
+
- NamedArray#[](key) is overridden to call Entity.prepare_entity(v, key), so if a field name is a recognized format (or carries it in parentheses, e.g., “Gene Name (Ensembl Gene ID)”), the returned cell value is wrapped as an entity.
|
|
194
|
+
|
|
195
|
+
Example:
|
|
196
|
+
```ruby
|
|
197
|
+
module SomeEntity; extend Entity; self.format = "SomeEntity"; end
|
|
198
|
+
|
|
199
|
+
row = NamedArray.setup(["a", "b"], %w(SomeEntity Other))
|
|
200
|
+
row["SomeEntity"].respond_to?(:all_properties) # => true
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
This makes TSV rows entity-aware when you deserialize via TSV.open; NamedArray instances become rich objects with entity behaviors available per column.
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## Introspection helpers
|
|
208
|
+
|
|
209
|
+
Entity::Object adds convenience to every annotated entity:
|
|
210
|
+
|
|
211
|
+
- entity_classes — list of Entity modules applied (from Annotation).
|
|
212
|
+
- base_entity — the last Entity in annotation_types, i.e., the primary one.
|
|
213
|
+
- all_properties — list of property names available across entity modules.
|
|
214
|
+
- _ary_property_cache — internal cache used to memoize array property evaluations for items.
|
|
215
|
+
|
|
216
|
+
The Entity module itself exposes:
|
|
217
|
+
- Entity.formats — global FormatIndex of format name → entity module, with tolerant lookup (find handles strings with extra decorations).
|
|
218
|
+
- Entity.prepare_entity(value, field, options={}) — utility to wrap a value or array into an entity based on format mapping.
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
## Examples
|
|
223
|
+
|
|
224
|
+
Define a property-rich entity and use it on values and arrays:
|
|
225
|
+
|
|
226
|
+
```ruby
|
|
227
|
+
module ReversableString
|
|
228
|
+
extend Entity
|
|
229
|
+
|
|
230
|
+
property :reverse_text_single => :single do
|
|
231
|
+
self.reverse
|
|
232
|
+
end
|
|
233
|
+
|
|
234
|
+
property :reverse_text_ary => :array do
|
|
235
|
+
self.collect { |s| s.reverse }
|
|
236
|
+
end
|
|
237
|
+
|
|
238
|
+
# Persist selected properties
|
|
239
|
+
persist :reverse_text_single, :marshal
|
|
240
|
+
persist :reverse_text_ary, :array
|
|
241
|
+
end
|
|
242
|
+
|
|
243
|
+
# Single
|
|
244
|
+
s = ReversableString.setup("String1")
|
|
245
|
+
s.reverse_text_single # => "1gnirtS"
|
|
246
|
+
|
|
247
|
+
# Array
|
|
248
|
+
arr = ReversableString.setup(["String1", "String2"])
|
|
249
|
+
arr.reverse_text_ary # => ["1gnirtS", "2gnirtS"]
|
|
250
|
+
arr[1].reverse_text_ary # uses cached array result; returns "2gnirtS"
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
Translate identifiers:
|
|
254
|
+
|
|
255
|
+
```ruby
|
|
256
|
+
module Person
|
|
257
|
+
extend Entity
|
|
258
|
+
end
|
|
259
|
+
|
|
260
|
+
# Identify formats and sources
|
|
261
|
+
Person.add_identifiers("/data/#{Entity::Identified::NAMESPACE_TAG}/identifiers.tsv",
|
|
262
|
+
"Name", "Alias")
|
|
263
|
+
|
|
264
|
+
Person.setup("001", format: "ID", namespace: :person).to("Alias") # => "Miki"
|
|
265
|
+
Person.setup("001", format: "ID", namespace: :person).to(:name) # => "Miguel"
|
|
266
|
+
|
|
267
|
+
list = Person.setup(["001"], format: "ID", namespace: :person)
|
|
268
|
+
list.to("Name") # => ["Miguel"]
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
Automatic entity wrapping from NamedArray/TSV:
|
|
272
|
+
|
|
273
|
+
```ruby
|
|
274
|
+
module Gene; extend Entity; self.format = "Ensembl Gene ID"; end
|
|
275
|
+
|
|
276
|
+
tsv = TSV.open <<~EOF
|
|
277
|
+
#: :sep=" " #:type=:list
|
|
278
|
+
#Id Ensembl Gene ID Other
|
|
279
|
+
row1 ENSG0001 X
|
|
280
|
+
EOF
|
|
281
|
+
|
|
282
|
+
row = tsv["row1"]
|
|
283
|
+
g = row["Ensembl Gene ID"] # => wrapped into Gene entity (if format registered)
|
|
284
|
+
g.all_properties # => property list for Gene
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
## Notes and edge cases
|
|
290
|
+
|
|
291
|
+
- Entity.prepare_entity duplicates input strings/arrays to avoid mutating caller state; array duplication can be forced per call via dup_array:true.
|
|
292
|
+
- For arrays, properties marked :array2single or :single2array adapt their interface between collection and element call sites.
|
|
293
|
+
- When using identifiers with NAMESPACE placeholders, ensure you set namespace on entities (Person.setup("001", namespace: :person)) or those files will be ignored.
|
|
294
|
+
- Persisted annotation properties (type :annotation) use a Tokyo Cabinet repo; you can supply a repo path via annotation_repo:, or let Persist.annotation_repo_persist create/use a repo by path.
|
|
295
|
+
|
|
296
|
+
Entity turns plain values into meaningful, behavior-rich objects tailored to your domain (genes, samples, users, etc.), with robust identifier translation and scalable property evaluation/persistence built-in.
|