lutaml-store 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.github/workflows/main.yml +27 -0
- data/.gitignore +12 -0
- data/.rspec +3 -0
- data/.rubocop.yml +10 -0
- data/.rubocop_todo.yml +450 -0
- data/CLAUDE.md +57 -0
- data/CODE_OF_CONDUCT.md +132 -0
- data/CORRECTED_HTTP_CACHE_IMPLEMENTATION.md +209 -0
- data/CORRECTED_HTTP_CACHE_PLAN.md +164 -0
- data/Gemfile +15 -0
- data/Gemfile.lock +220 -0
- data/README.adoc +1430 -0
- data/Rakefile +12 -0
- data/TODO.impl/0-lutaml-store-self-quality.md +112 -0
- data/TODO.impl/1-lutaml-hal-migration.md +60 -0
- data/TODO.impl/2-glossarist-migration.md +359 -0
- data/TODO.impl/3-lutaml-jsonschema-migration.md +273 -0
- data/bin/console +11 -0
- data/bin/setup +8 -0
- data/demo/Gemfile +15 -0
- data/demo/Gemfile.lock +61 -0
- data/demo/README.adoc +301 -0
- data/demo/data/vcards/co/contact_10_thompson.data +1 -0
- data/demo/data/vcards/co/contact_10_thompson.meta +1 -0
- data/demo/data/vcards/co/contact_1_doe.data +1 -0
- data/demo/data/vcards/co/contact_1_doe.meta +1 -0
- data/demo/data/vcards/co/contact_2_smith.data +1 -0
- data/demo/data/vcards/co/contact_2_smith.meta +1 -0
- data/demo/data/vcards/co/contact_3_johnson.data +1 -0
- data/demo/data/vcards/co/contact_3_johnson.meta +1 -0
- data/demo/data/vcards/co/contact_4_garcia.data +1 -0
- data/demo/data/vcards/co/contact_4_garcia.meta +1 -0
- data/demo/data/vcards/co/contact_5_wilson.data +1 -0
- data/demo/data/vcards/co/contact_5_wilson.meta +1 -0
- data/demo/data/vcards/co/contact_6_brown.data +1 -0
- data/demo/data/vcards/co/contact_6_brown.meta +1 -0
- data/demo/data/vcards/co/contact_7_davis.data +1 -0
- data/demo/data/vcards/co/contact_7_davis.meta +1 -0
- data/demo/data/vcards/co/contact_8_anderson.data +1 -0
- data/demo/data/vcards/co/contact_8_anderson.meta +1 -0
- data/demo/data/vcards/co/contact_9_taylor.data +1 -0
- data/demo/data/vcards/co/contact_9_taylor.meta +1 -0
- data/demo/data/vcards.db +0 -0
- data/demo/pottery_class_demo.rb +164 -0
- data/demo/vcard_models.rb +140 -0
- data/demo/vcard_store_demo.rb +526 -0
- data/lib/lutaml/store/adapter/base.rb +65 -0
- data/lib/lutaml/store/adapter/filesystem.rb +288 -0
- data/lib/lutaml/store/adapter/memory.rb +225 -0
- data/lib/lutaml/store/adapter/sqlite.rb +193 -0
- data/lib/lutaml/store/adapter.rb +12 -0
- data/lib/lutaml/store/attribute_updater.rb +198 -0
- data/lib/lutaml/store/basic_store.rb +190 -0
- data/lib/lutaml/store/cache.rb +108 -0
- data/lib/lutaml/store/cache_store.rb +282 -0
- data/lib/lutaml/store/composite_model_handler.rb +169 -0
- data/lib/lutaml/store/compression.rb +137 -0
- data/lib/lutaml/store/config.rb +178 -0
- data/lib/lutaml/store/database_store.rb +425 -0
- data/lib/lutaml/store/events.rb +92 -0
- data/lib/lutaml/store/format/base.rb +33 -0
- data/lib/lutaml/store/format/json.rb +25 -0
- data/lib/lutaml/store/format/jsonl.rb +37 -0
- data/lib/lutaml/store/format/marshal_format.rb +37 -0
- data/lib/lutaml/store/format/yaml.rb +29 -0
- data/lib/lutaml/store/format/yamls.rb +35 -0
- data/lib/lutaml/store/format.rb +33 -0
- data/lib/lutaml/store/http_cache.rb +279 -0
- data/lib/lutaml/store/http_cache_config.rb +53 -0
- data/lib/lutaml/store/http_cache_entry.rb +69 -0
- data/lib/lutaml/store/http_header_processor.rb +175 -0
- data/lib/lutaml/store/integrity.rb +102 -0
- data/lib/lutaml/store/model_registration.rb +75 -0
- data/lib/lutaml/store/model_registry.rb +123 -0
- data/lib/lutaml/store/model_serializer.rb +69 -0
- data/lib/lutaml/store/monitor.rb +192 -0
- data/lib/lutaml/store/storage_key.rb +40 -0
- data/lib/lutaml/store/version.rb +7 -0
- data/lib/lutaml/store.rb +41 -0
- data/lutaml-store.gemspec +35 -0
- data/plan.adoc +606 -0
- data/sig/lutaml/store.rbs +6 -0
- data/spec/lutaml/store/adapter_interface_spec.rb +89 -0
- data/spec/lutaml/store/anti_pattern_guard_spec.rb +35 -0
- data/spec/lutaml/store/anti_pattern_spec.rb +78 -0
- data/spec/lutaml/store/autoload_spec.rb +34 -0
- data/spec/lutaml/store/cache_store_spec.rb +271 -0
- data/spec/lutaml/store/compression_spec.rb +78 -0
- data/spec/lutaml/store/config_enhanced_spec.rb +158 -0
- data/spec/lutaml/store/corrected_http_cache_integration_spec.rb +336 -0
- data/spec/lutaml/store/custom_serializer_spec.rb +108 -0
- data/spec/lutaml/store/database_store_spec.rb +279 -0
- data/spec/lutaml/store/file_io_spec.rb +219 -0
- data/spec/lutaml/store/format_round_trip_spec.rb +110 -0
- data/spec/lutaml/store/format_spec.rb +70 -0
- data/spec/lutaml/store/http_cache_entry_spec.rb +203 -0
- data/spec/lutaml/store/http_cache_hal_integration_spec.rb +404 -0
- data/spec/lutaml/store/http_cache_spec.rb +422 -0
- data/spec/lutaml/store/http_header_processor_spec.rb +290 -0
- data/spec/lutaml/store/import_spec.rb +90 -0
- data/spec/lutaml/store/integrity_spec.rb +157 -0
- data/spec/lutaml/store/key_collision_serializer_spec.rb +98 -0
- data/spec/lutaml/store/load_save_spec.rb +107 -0
- data/spec/lutaml/store/lutaml_model_integration_spec.rb +291 -0
- data/spec/lutaml/store/model_serializer_spec.rb +140 -0
- data/spec/lutaml/store/store_spec.rb +182 -0
- data/spec/lutaml/store_spec.rb +21 -0
- data/spec/spec_helper.rb +16 -0
- metadata +166 -0
data/Rakefile
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
# TODO.impl/0-lutaml-store-self-quality.md
|
|
2
|
+
|
|
3
|
+
# lutaml-store Internal Quality Fixes (prerequisite for migrations)
|
|
4
|
+
|
|
5
|
+
## Completed
|
|
6
|
+
|
|
7
|
+
All anti-patterns eliminated from `lib/`:
|
|
8
|
+
|
|
9
|
+
| Anti-pattern | Remaining in lib/ |
|
|
10
|
+
|---|---|
|
|
11
|
+
| `instance_variable_get/set` | 0 (only a comment reference) |
|
|
12
|
+
| `respond_to?` | 0 (only a comment reference) |
|
|
13
|
+
| `send` on private methods | 0 |
|
|
14
|
+
|
|
15
|
+
### Changes made
|
|
16
|
+
|
|
17
|
+
1. **Created `ModelSerializer`** — unified serialization/deserialization into one class, eliminating `respond_to?` chains and duplicated code across `DatabaseStore`, `CompositeModelHandler`, `Serializer`.
|
|
18
|
+
2. **Added `Store#emit_event`** public method — eliminated `instance_variable_get(:@events)`.
|
|
19
|
+
3. **Fixed `AttributeUpdater`** — uses proper constructors instead of `instance_variable_set/get`.
|
|
20
|
+
4. **Fixed `CacheStore`** — uses proper factory methods.
|
|
21
|
+
5. **Removed `respond_to?` from adapters** — calls methods directly on `Adapter::Base` interface.
|
|
22
|
+
6. **Removed `CacheInspector`** — unused code.
|
|
23
|
+
7. **Added custom serializer support** — `ModelRegistration` accepts `serializer:` option for models with non-standard serialization (e.g., glossarist's `key_value` DSL).
|
|
24
|
+
|
|
25
|
+
## Anti-pattern audit (current state)
|
|
26
|
+
|
|
27
|
+
### 1. `instance_variable_get` / `instance_variable_set` — breaks encapsulation
|
|
28
|
+
|
|
29
|
+
| File | Line | Usage |
|
|
30
|
+
|---|---|---|
|
|
31
|
+
| `database_store.rb` | 375 | `@store.instance_variable_get(:@events)&.emit(event, data)` |
|
|
32
|
+
| `model_store.rb` | 230 | Same pattern — reaching into `@store`'s internals |
|
|
33
|
+
| `attribute_updater.rb` | 228 | `model.instance_variable_set(var, upgraded_model.instance_variable_get(var))` — polymorphic upgrade hack |
|
|
34
|
+
| `cache_store.rb` | 45 | `entry.instance_variable_set(:@created_at, ...)` — bypassing constructor |
|
|
35
|
+
|
|
36
|
+
**Fix:** Expose proper public APIs on the objects being accessed:
|
|
37
|
+
- `Store` should expose `emit_event(event, data)` as a public method.
|
|
38
|
+
- `AttributeUpdater#try_polymorphic_upgrade` must use proper constructors or
|
|
39
|
+
`Lutaml::Model::Serializable`'s own API — never copy instance variables.
|
|
40
|
+
- `CacheStore` should use factory methods or proper attribute setters.
|
|
41
|
+
|
|
42
|
+
### 2. `respond_to?` — poor typing / duck-typing smell
|
|
43
|
+
|
|
44
|
+
**24 occurrences across the codebase.** The worst offenders:
|
|
45
|
+
|
|
46
|
+
| File | Pattern | Fix |
|
|
47
|
+
|---|---|---|
|
|
48
|
+
| `composite_model_handler.rb` L102-133 | `respond_to?(:to_hash)`, `respond_to?(:from_hash)` | All models are `Lutaml::Model::Serializable` — use `is_a?` checks against a serialization protocol, or better, use `to_hash` directly via the type system |
|
|
49
|
+
| `database_store.rb` L288-327 | Same serialization dispatch via `respond_to?` | Extract a `SerializationAdapter` that knows how to (de)serialize `Lutaml::Model::Serializable` |
|
|
50
|
+
| `attribute_updater.rb` L87-129 | `respond_to?(setter_method)` | Use the model's attribute metadata from `Lutaml::Model::Serializable.attributes` |
|
|
51
|
+
| `serializer.rb` | 14 occurrences | Full rewrite needed — see below |
|
|
52
|
+
| `http_cache.rb` L112-142 | `@adapter.respond_to?(:clear)` | All adapters inherit from `Adapter::Base` which defines `#clear` — just call it |
|
|
53
|
+
|
|
54
|
+
**Fix strategy:**
|
|
55
|
+
- Create a `Serializable` protocol module (`Lutaml::Store::Serializable`) that
|
|
56
|
+
formalizes the serialization contract (`to_store_hash`, `from_store_hash`).
|
|
57
|
+
- Replace all `respond_to?` with type-based dispatch or method calls on known base classes.
|
|
58
|
+
- For attribute validation, use `model.class.attributes` (from Lutaml::Model).
|
|
59
|
+
|
|
60
|
+
### 3. Duplicated serialization logic
|
|
61
|
+
|
|
62
|
+
`DatabaseStore#serialize_model`, `DatabaseStore#deserialize_model`,
|
|
63
|
+
`CompositeModelHandler#serialize_model`, `CompositeModelHandler#deserialize_model`
|
|
64
|
+
all contain identical `respond_to?` chains for `(to_hash|to_h|to_s)` and
|
|
65
|
+
`(from_hash|from_h|new)`.
|
|
66
|
+
|
|
67
|
+
**Fix:** Extract a single `Lutaml::Store::ModelSerializer` class:
|
|
68
|
+
|
|
69
|
+
```ruby
|
|
70
|
+
class ModelSerializer
|
|
71
|
+
def serialize(model)
|
|
72
|
+
model.to_hash.merge("_class" => model.class.name)
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def deserialize(data, expected_class)
|
|
76
|
+
klass = Object.const_get(data["_class"])
|
|
77
|
+
klass.from_hash(data.except("_class", "_composite_models"))
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Since all registered models are `Lutaml::Model::Serializable`, they all have
|
|
83
|
+
`to_hash` and `from_hash`. No need for duck-typing fallback chains.
|
|
84
|
+
|
|
85
|
+
### 4. Event emission through encapsulation violation
|
|
86
|
+
|
|
87
|
+
Both `DatabaseStore` and `ModelStore` emit events via:
|
|
88
|
+
```ruby
|
|
89
|
+
@store.instance_variable_get(:@events)&.emit(event, data)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
**Fix:** Add `Store#emit_event(event, data)` as a public method. The `events`
|
|
93
|
+
object is an internal implementation detail and should not be leaked.
|
|
94
|
+
|
|
95
|
+
### 5. `CacheInspector` — untested, unused?
|
|
96
|
+
|
|
97
|
+
Check if `cache_inspector.rb` is actually used anywhere. If not, remove it.
|
|
98
|
+
|
|
99
|
+
### 6. Open/closed principle violations
|
|
100
|
+
|
|
101
|
+
- `AttributeUpdater#try_polymorphic_upgrade` uses `instance_variable_set` and
|
|
102
|
+
`model.extend(registered_class)` — this is metaprogramming that breaks OCP.
|
|
103
|
+
Instead, create a new model instance of the subclass and replace the reference.
|
|
104
|
+
|
|
105
|
+
## Implementation order
|
|
106
|
+
|
|
107
|
+
1. **Add `Store#emit_event` public method** — eliminates `instance_variable_get(:@events)`.
|
|
108
|
+
2. **Create `ModelSerializer`** — unify all serialization/deserialization into one class. Eliminates `respond_to?` chains and duplicated code.
|
|
109
|
+
3. **Fix `AttributeUpdater`** — remove `try_polymorphic_upgrade`'s `instance_variable_set/get`. Use proper factory pattern.
|
|
110
|
+
4. **Fix `CacheStore`** — use proper constructor/factory instead of `instance_variable_set`.
|
|
111
|
+
5. **Remove `respond_to?` from adapters** — call methods directly on `Adapter::Base` interface.
|
|
112
|
+
6. **Audit specs** — add specs that verify no `send`, no `instance_variable_get/set`, no `respond_to?` regressions.
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# TODO.impl/1-lutaml-hal-migration.md
|
|
2
|
+
|
|
3
|
+
# Migration Plan: lutaml-hal → lutaml-store
|
|
4
|
+
|
|
5
|
+
## Completed
|
|
6
|
+
|
|
7
|
+
### Anti-pattern elimination (all done)
|
|
8
|
+
|
|
9
|
+
All `instance_variable_set/get`, `send(:private_method)`, and `respond_to?` calls
|
|
10
|
+
have been eliminated from `lib/lutaml/hal/`.
|
|
11
|
+
|
|
12
|
+
| File | Changes |
|
|
13
|
+
|---|---|
|
|
14
|
+
| `resource.rb` | Added `embedded_data=` setter; `from_embedded` uses `public_send` instead of `instance_variable_set` |
|
|
15
|
+
| `page.rb` | `instance_variable_get` → `public_send(Hal::REGISTER_ID_ATTR_NAME)` |
|
|
16
|
+
| `model_register.rb` | All methods used by Link made public; `instance_variable_set` → `embedded_data=`; `instance_variable_set` for register ID → `public_send(setter)`; `send(key)` → `public_send(key)`; `respond_to?(:status)` → `faraday_response?(response) && response.status == 304` |
|
|
17
|
+
| `link.rb` | Removed all `register.send(:private_method, ...)` calls (4 instances); `instance_variable_get` → `public_send`; `respond_to?(:embedded_data)` → `is_a?(Resource)` type check |
|
|
18
|
+
| `global_register.rb` | Removed `respond_to?` checks from `clear_all_caches` and `cache_stats` |
|
|
19
|
+
| `cache/cache_metadata.rb` | Extracted `ResponseHeaders` and `ResponseStatus` modules replacing `respond_to?` proc lambdas |
|
|
20
|
+
| `cache/cache_manager.rb` | `respond_to?` → `rescue NoMethodError` for optional methods; `is_a?` type check for HttpCache; proper HttpCache API delegation (`get(:get, url)`, `set(:get, url, {}, response)`) |
|
|
21
|
+
| `rate_limiter.rb` | Simplified with `is_a?` type check |
|
|
22
|
+
|
|
23
|
+
### Cache architecture fixes
|
|
24
|
+
|
|
25
|
+
- `http_aware?` now requires explicit opt-in (`http_aware == true`) instead of defaulting to true when HttpCache is available
|
|
26
|
+
- `CacheManager` properly delegates to `HttpCache`'s method/url API (`get(:get, url)`, `set(:get, url, {}, response)`, `delete(:get, url)`)
|
|
27
|
+
- Non-http-aware path uses `SimpleCacheStore` (in-memory, no serialization) instead of `CacheStore` (which JSON-serializes and can't handle `CacheEntry` objects)
|
|
28
|
+
- Removed `create_basic_cache` method (no longer needed)
|
|
29
|
+
|
|
30
|
+
### Spec fixes
|
|
31
|
+
|
|
32
|
+
- `cache_integration_spec.rb`: Fixed `REGISTER_ID_ATTR_NAME` (requires `lutaml/hal` instead of redefining constant); added `status: 200` to mock responses; fixed cache key in legacy test; removed `instance_variable_get`/`send` usage
|
|
33
|
+
- `cache_manager_spec.rb`: Updated stats tests to stub `stats` (not `cache_info`); updated `http_aware_cache?` test to use `is_a?` instead of `respond_to?`; fixed `get_from_http_cache` test signature
|
|
34
|
+
- `cache_configuration_spec.rb`: Fixed `http_aware?` nil default expectation; fixed adapter_config error type
|
|
35
|
+
- `cache_metadata_spec.rb`: Added `status: 200` to test doubles
|
|
36
|
+
|
|
37
|
+
### Test results
|
|
38
|
+
|
|
39
|
+
- **lutaml-hal**: 210 examples, 0 failures
|
|
40
|
+
- **lutaml-store**: 248 examples, 1 pre-existing failure (vary header spec) + 25 pre-existing failures in HTTP cache integration specs
|
|
41
|
+
- **Anti-patterns in lib code**: 0 `instance_variable_set/get`, 0 `send`, 0 `respond_to?`
|
|
42
|
+
|
|
43
|
+
## Remaining (future work)
|
|
44
|
+
|
|
45
|
+
### Phase 1: Unify caching through lutaml-store
|
|
46
|
+
|
|
47
|
+
Create `HalStore` as a thin wrapper around `Lutaml::Store` for HAL-specific
|
|
48
|
+
caching. Remove `SimpleCacheStore` (replaced by lutaml-store memory adapter).
|
|
49
|
+
Remove `Client`'s `@cache` Hash. Simplify `CacheManager`.
|
|
50
|
+
|
|
51
|
+
### Phase 2: Register ID tracking as proper attribute
|
|
52
|
+
|
|
53
|
+
Replace `_global_register_id` instance variable with a proper
|
|
54
|
+
`Lutaml::Model::Serializable` attribute on `Resource`. Make embedded data a
|
|
55
|
+
proper attribute. Eliminates `Hal::REGISTER_ID_ATTR_NAME` constant.
|
|
56
|
+
|
|
57
|
+
### Phase 3: Store HAL resources as registered models
|
|
58
|
+
|
|
59
|
+
Register HAL resource classes with lutaml-store's model registry for direct
|
|
60
|
+
CRUD operations instead of wrapping them in `CacheEntry`.
|
|
@@ -0,0 +1,359 @@
|
|
|
1
|
+
# TODO.impl/2-glossarist-migration.md
|
|
2
|
+
|
|
3
|
+
# Migration Plan: glossarist-ruby → lutaml-store
|
|
4
|
+
|
|
5
|
+
## Current state analysis
|
|
6
|
+
|
|
7
|
+
### How glossarist stores LutaML model objects
|
|
8
|
+
|
|
9
|
+
Glossarist is the most complex of the three repos. It manages glossary/terminology
|
|
10
|
+
concepts with multiple versions (v1, v2, v3) and several storage mechanisms:
|
|
11
|
+
|
|
12
|
+
1. **Filesystem YAML persistence** — `ConceptManager` reads/writes concept YAML
|
|
13
|
+
files from a directory structure. Concepts are stored as:
|
|
14
|
+
- `concept/{uuid}.yaml` — the managed concept
|
|
15
|
+
- `localized_concept/{uuid}.yaml` — per-language localizations
|
|
16
|
+
- Grouped files: `{uuid}.yaml` containing concept + all localizations
|
|
17
|
+
|
|
18
|
+
2. **ZIP package (GCR format)** — `GcrPackage` reads/writes concepts from ZIP
|
|
19
|
+
archives containing `metadata.yaml`, `concepts/*.yaml`, optional compiled
|
|
20
|
+
formats (TBX, JSON-LD, Turtle), and dataset assets.
|
|
21
|
+
|
|
22
|
+
3. **In-memory collections** — `ManagedConceptCollection`, `Collection`,
|
|
23
|
+
`Collections::Collection`, `Collections::TypedCollection` — all use plain
|
|
24
|
+
Arrays/Hashes to hold models in memory.
|
|
25
|
+
|
|
26
|
+
### Architecture (current)
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
Glossarist module
|
|
30
|
+
├── Collection (v1)
|
|
31
|
+
│ ├── @index (Hash id → Concept)
|
|
32
|
+
│ ├── @path (String — filesystem path)
|
|
33
|
+
│ ├── load_concepts → Dir.glob → Concept.from_yaml
|
|
34
|
+
│ └── save_concepts → File.write(Psych.dump)
|
|
35
|
+
│
|
|
36
|
+
├── ManagedConceptCollection (v2/v3)
|
|
37
|
+
│ ├── @managed_concepts (Array)
|
|
38
|
+
│ ├── @managed_concepts_ids (Hash id → uuid)
|
|
39
|
+
│ ├── load_from_files → ConceptManager
|
|
40
|
+
│ └── save_to_files → ConceptManager
|
|
41
|
+
│
|
|
42
|
+
├── ConceptManager
|
|
43
|
+
│ ├── path, localized_concepts_path
|
|
44
|
+
│ ├── load_from_files → Dir.glob → ConceptDocument.from_yamls → ManagedConcept
|
|
45
|
+
│ ├── save_to_files → File.write(to_yaml)
|
|
46
|
+
│ ├── save_grouped_concepts_to_files
|
|
47
|
+
│ └── Versioned concept document classes (v2, v3)
|
|
48
|
+
│
|
|
49
|
+
├── GcrPackage
|
|
50
|
+
│ ├── write → Zip::File → concept YAMLs + metadata
|
|
51
|
+
│ ├── read → Zip::File → parse → ManagedConcept instances
|
|
52
|
+
│ └── Compiled format generation (TBX, JSON-LD, Turtle)
|
|
53
|
+
│
|
|
54
|
+
├── ConceptCollector
|
|
55
|
+
│ └── Static methods for scanning directories, detecting schema versions
|
|
56
|
+
│
|
|
57
|
+
└── Model classes (all Lutaml::Model::Serializable):
|
|
58
|
+
├── ManagedConcept (key: identifier/uuid)
|
|
59
|
+
├── LocalizedConcept (inherits Concept)
|
|
60
|
+
├── ConceptData
|
|
61
|
+
├── Designation::Base (polymorphic)
|
|
62
|
+
├── DetailedDefinition
|
|
63
|
+
├── ConceptSource
|
|
64
|
+
└── ... many more
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Key observations
|
|
68
|
+
|
|
69
|
+
**Glossarist has no storage abstraction.** File I/O is scattered across:
|
|
70
|
+
- `Collection#load_concepts`, `Collection#save_concept_to_file`
|
|
71
|
+
- `ConceptManager#load_concept_from_file`, `ConceptManager#save_concept_to_file`
|
|
72
|
+
- `GcrPackage#write`, `GcrPackage#read`
|
|
73
|
+
- `ConceptCollector` (static methods doing Dir.glob)
|
|
74
|
+
|
|
75
|
+
**Every storage operation is ad-hoc:**
|
|
76
|
+
- YAML serialization uses `Lutaml::Model::Serializable#to_yaml` / `.from_yaml`
|
|
77
|
+
- File naming uses concept UUIDs
|
|
78
|
+
- Directory layout is version-dependent
|
|
79
|
+
- ZIP packaging duplicates the filesystem logic
|
|
80
|
+
|
|
81
|
+
### Problems identified
|
|
82
|
+
|
|
83
|
+
| Category | Issue | Location |
|
|
84
|
+
|---|---|---|
|
|
85
|
+
| **No storage abstraction** | File I/O is in `ConceptManager`, `Collection`, `GcrPackage`, `ConceptCollector` — no single persistence layer | Multiple |
|
|
86
|
+
| **MECE violation** | `ManagedConceptCollection` manages an Array + a separate id→uuid Hash — this is a database, but hand-rolled | `managed_concept_collection.rb` |
|
|
87
|
+
| **DRY violation** | YAML file I/O patterns repeated in `Collection`, `ConceptManager`, `GcrPackage` | Multiple |
|
|
88
|
+
| **Schema version branching** | Version detection (v1/v2/v3) is sprinkled through `ConceptCollector`, `ConceptManager`, `SchemaMigration` | Multiple |
|
|
89
|
+
| **Lazy loading missing** | `Collection` has a TODO: "Add support for lazy concept loading" — all concepts loaded eagerly into memory | `collection.rb:4` |
|
|
90
|
+
| **OCP violation** | Adding a new storage backend (e.g., database) requires modifying `ConceptManager`, `Collection`, and `GcrPackage` | Multiple |
|
|
91
|
+
|
|
92
|
+
## Migration strategy
|
|
93
|
+
|
|
94
|
+
### Phase 1: Define a `ConceptStore` interface backed by lutaml-store
|
|
95
|
+
|
|
96
|
+
The core insight: Glossarist's concept management is a CRUD store with:
|
|
97
|
+
- **Model:** `ManagedConcept` (key: `uuid`)
|
|
98
|
+
- **Composite models:** `LocalizedConcept` instances per language
|
|
99
|
+
- **Backends:** Filesystem YAML, ZIP archive, and (future) SQLite
|
|
100
|
+
|
|
101
|
+
```ruby
|
|
102
|
+
module Glossarist
|
|
103
|
+
class ConceptStore
|
|
104
|
+
def initialize(adapter:, schema_version: "3")
|
|
105
|
+
@store = Lutaml::Store.new(
|
|
106
|
+
adapter: adapter,
|
|
107
|
+
models: [
|
|
108
|
+
{
|
|
109
|
+
model: Glossarist::ManagedConcept,
|
|
110
|
+
key: :uuid,
|
|
111
|
+
polymorphic_class_key: nil
|
|
112
|
+
},
|
|
113
|
+
{
|
|
114
|
+
model: Glossarist::LocalizedConcept,
|
|
115
|
+
key: :uuid
|
|
116
|
+
}
|
|
117
|
+
]
|
|
118
|
+
)
|
|
119
|
+
@schema_version = schema_version
|
|
120
|
+
end
|
|
121
|
+
|
|
122
|
+
# CRUD operations
|
|
123
|
+
def save(managed_concept) = @store.save(managed_concept)
|
|
124
|
+
def fetch(uuid) = @store.fetch(model: ManagedConcept, uuid: uuid)
|
|
125
|
+
def fetch_by_id(id) = where(model: ManagedConcept, identifier: id).first
|
|
126
|
+
def update(uuid, **attrs) = @store.update(model: ManagedConcept, uuid: uuid, attributes: attrs)
|
|
127
|
+
def delete(uuid) = @store.destroy(model: ManagedConcept, uuid: uuid)
|
|
128
|
+
|
|
129
|
+
# Query
|
|
130
|
+
def all = @store.all(model: ManagedConcept)
|
|
131
|
+
def where(model:, **conditions) = @store.where(model: model, **conditions)
|
|
132
|
+
def count = @store.count(model: ManagedConcept)
|
|
133
|
+
|
|
134
|
+
# Composite model access
|
|
135
|
+
def fetch_localized(managed_concept_uuid, lang)
|
|
136
|
+
concept = fetch(managed_concept_uuid)
|
|
137
|
+
concept&.localization(lang)
|
|
138
|
+
end
|
|
139
|
+
end
|
|
140
|
+
end
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Phase 2: Implement filesystem adapter for Glossarist
|
|
144
|
+
|
|
145
|
+
Glossarist's filesystem layout is specific:
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
concepts/
|
|
149
|
+
concept/
|
|
150
|
+
{uuid}.yaml
|
|
151
|
+
localized_concept/
|
|
152
|
+
{uuid}.yaml
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
This maps to lutaml-store's FileSystem adapter with custom path resolution:
|
|
156
|
+
|
|
157
|
+
```ruby
|
|
158
|
+
store = Glossarist::ConceptStore.new(
|
|
159
|
+
adapter: {
|
|
160
|
+
type: :filesystem,
|
|
161
|
+
path: "/path/to/concepts",
|
|
162
|
+
extension: "yaml",
|
|
163
|
+
# Custom naming: use uuid as filename, organize by model type
|
|
164
|
+
naming_strategy: :uuid,
|
|
165
|
+
directory_layout: {
|
|
166
|
+
ManagedConcept => "concept",
|
|
167
|
+
LocalizedConcept => "localized_concept"
|
|
168
|
+
}
|
|
169
|
+
}
|
|
170
|
+
)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
This requires extending lutaml-store's FileSystem adapter to support:
|
|
174
|
+
- **Custom directory layout** (model type → subdirectory)
|
|
175
|
+
- **Custom key-to-filename mapping** (UUID-based naming)
|
|
176
|
+
- **YAML serialization** (already supported by lutaml-model)
|
|
177
|
+
|
|
178
|
+
Alternatively, add a `Glossarist::Adapters::YamlFilesystem` adapter that
|
|
179
|
+
implements `Lutaml::Store::Adapter::Base`.
|
|
180
|
+
|
|
181
|
+
### Phase 3: Implement ZIP archive adapter
|
|
182
|
+
|
|
183
|
+
GCR packages are ZIP archives. Create a `Glossarist::Adapters::ZipArchive`
|
|
184
|
+
adapter:
|
|
185
|
+
|
|
186
|
+
```ruby
|
|
187
|
+
class ZipArchive < Lutaml::Store::Adapter::Base
|
|
188
|
+
def initialize(config)
|
|
189
|
+
@zip_path = config[:path]
|
|
190
|
+
# ...
|
|
191
|
+
end
|
|
192
|
+
|
|
193
|
+
def save(key, data, metadata = {})
|
|
194
|
+
Zip::File.open(@zip_path, create: true) do |zf|
|
|
195
|
+
zf.get_output_stream(key_to_entry_name(key)) do |f|
|
|
196
|
+
f.write(data.to_yaml)
|
|
197
|
+
end
|
|
198
|
+
end
|
|
199
|
+
end
|
|
200
|
+
|
|
201
|
+
def load(key)
|
|
202
|
+
Zip::File.open(@zip_path) do |zf|
|
|
203
|
+
entry = zf.find_entry(key_to_entry_name(key))
|
|
204
|
+
entry&.get_input_stream&.read
|
|
205
|
+
end
|
|
206
|
+
end
|
|
207
|
+
# ... implement remaining Adapter::Base methods
|
|
208
|
+
end
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
This makes `GcrPackage` a thin wrapper around `ConceptStore` with a ZIP adapter.
|
|
212
|
+
|
|
213
|
+
### Phase 4: Migrate `ManagedConceptCollection`
|
|
214
|
+
|
|
215
|
+
Replace the hand-rolled Array + Hash index with `ConceptStore`:
|
|
216
|
+
|
|
217
|
+
```ruby
|
|
218
|
+
class ManagedConceptCollection
|
|
219
|
+
def initialize(store:)
|
|
220
|
+
@store = store # Glossarist::ConceptStore
|
|
221
|
+
end
|
|
222
|
+
|
|
223
|
+
def fetch(uuid) = @store.fetch(uuid)
|
|
224
|
+
def store(concept) = @store.save(concept)
|
|
225
|
+
def each(&block) = @store.all.each(&block)
|
|
226
|
+
|
|
227
|
+
# Remove: @managed_concepts array, @managed_concepts_ids hash
|
|
228
|
+
# Remove: load_from_files, save_to_files (delegate to store)
|
|
229
|
+
end
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
### Phase 5: Migrate `Collection` (v1)
|
|
233
|
+
|
|
234
|
+
The v1 `Collection` follows the same pattern but simpler:
|
|
235
|
+
|
|
236
|
+
```ruby
|
|
237
|
+
class Collection
|
|
238
|
+
def initialize(store:)
|
|
239
|
+
@store = store
|
|
240
|
+
end
|
|
241
|
+
|
|
242
|
+
def fetch(id) = @store.fetch_by_id(id)
|
|
243
|
+
def store(concept) = @store.save(concept)
|
|
244
|
+
# Remove: @index, @path, load_concepts, save_concepts
|
|
245
|
+
end
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### Phase 6: Migrate `GcrPackage`
|
|
249
|
+
|
|
250
|
+
`GcrPackage` becomes a factory that creates a `ConceptStore` with a ZIP adapter:
|
|
251
|
+
|
|
252
|
+
```ruby
|
|
253
|
+
class GcrPackage
|
|
254
|
+
def self.load(zip_path)
|
|
255
|
+
store = ConceptStore.new(adapter: { type: :zip, path: zip_path })
|
|
256
|
+
metadata = load_metadata(zip_path)
|
|
257
|
+
concepts = store.all
|
|
258
|
+
new(zip_path, metadata, concepts)
|
|
259
|
+
end
|
|
260
|
+
|
|
261
|
+
def self.create(concepts:, metadata:, output_path:, **opts)
|
|
262
|
+
store = ConceptStore.new(adapter: { type: :zip, path: output_path })
|
|
263
|
+
store.save(concepts)
|
|
264
|
+
write_metadata(output_path, metadata)
|
|
265
|
+
# Compiled format generation stays here — it's a presentation concern
|
|
266
|
+
end
|
|
267
|
+
end
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### Phase 7: Migrate `ConceptCollector`
|
|
271
|
+
|
|
272
|
+
`ConceptCollector` currently has 230 lines of directory-scanning logic.
|
|
273
|
+
With lutaml-store, most of this becomes:
|
|
274
|
+
|
|
275
|
+
```ruby
|
|
276
|
+
def collect(dir)
|
|
277
|
+
store = ConceptStore.new(adapter: { type: :filesystem, path: dir })
|
|
278
|
+
store.all
|
|
279
|
+
end
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
Version detection logic moves into the adapter layer.
|
|
283
|
+
|
|
284
|
+
## Completed
|
|
285
|
+
|
|
286
|
+
### Phase 1: ConceptStore with custom serializer (done)
|
|
287
|
+
|
|
288
|
+
Created `Glossarist::ConceptStore` backed by `Lutaml::Store` with full CRUD.
|
|
289
|
+
|
|
290
|
+
**Problem:** `ManagedConcept` uses `key_value` DSL that maps both `uuid` and
|
|
291
|
+
`identifier` to the same `"id"` hash key — lossy for storage.
|
|
292
|
+
|
|
293
|
+
**Solution:** Added pluggable serializer support to lutaml-store:
|
|
294
|
+
- `ModelRegistration` now accepts `serializer:` option
|
|
295
|
+
- `ModelSerializer` delegates to custom serializer when registered
|
|
296
|
+
- Created `Glossarist::ConceptSerializer` — attribute-based serialization that
|
|
297
|
+
uses attribute names as hash keys (bypasses `key_value` mappings entirely)
|
|
298
|
+
|
|
299
|
+
| File | Change |
|
|
300
|
+
|---|---|
|
|
301
|
+
| `lutaml-store/lib/lutaml/store/model_registration.rb` | Added `serializer` attr_reader and option |
|
|
302
|
+
| `lutaml-store/lib/lutaml/store/model_serializer.rb` | Accepts `registration` param, delegates to custom serializer |
|
|
303
|
+
| `lutaml-store/lib/lutaml/store/database_store.rb` | Passes registration to serialize/deserialize calls |
|
|
304
|
+
| `lutaml-store/spec/lutaml/store/custom_serializer_spec.rb` | 4 specs for custom serializer feature |
|
|
305
|
+
| `glossarist/lib/glossarist/concept_serializer.rb` | Attribute-based serializer for ManagedConcept |
|
|
306
|
+
| `glossarist/lib/glossarist/concept_store.rb` | CRUD store backed by lutaml-store with custom serializer |
|
|
307
|
+
| `glossarist/lib/glossarist.rb` | Autoloads for ConceptSerializer, ConceptStore |
|
|
308
|
+
| `glossarist/Gemfile` | Fixed lutaml-store path (`../../lutaml/lutaml-store`) |
|
|
309
|
+
| `glossarist/glossarist.gemspec` | Added `lutaml-store ~> 0.1.0` dependency |
|
|
310
|
+
| `glossarist/spec/unit/concept_serializer_spec.rb` | 4 specs for serializer round-trip |
|
|
311
|
+
| `glossarist/spec/unit/concept_store_spec.rb` | 12 specs for full CRUD |
|
|
312
|
+
|
|
313
|
+
### Test results
|
|
314
|
+
|
|
315
|
+
- **lutaml-store:** 19 core specs pass (database_store + custom_serializer)
|
|
316
|
+
- **glossarist:** 1154 examples, 0 failures (no regressions)
|
|
317
|
+
|
|
318
|
+
## Remaining (future work)
|
|
319
|
+
|
|
320
|
+
| File | Purpose |
|
|
321
|
+
|---|---|
|
|
322
|
+
| `lib/glossarist/concept_store.rb` | Glossarist-specific store backed by `Lutaml::Store` |
|
|
323
|
+
| `lib/glossarist/adapters/yaml_filesystem.rb` | Custom filesystem adapter for Glossarist's YAML layout |
|
|
324
|
+
| `lib/glossarist/adapters/zip_archive.rb` | ZIP archive adapter for GCR packages |
|
|
325
|
+
|
|
326
|
+
## Files to modify
|
|
327
|
+
|
|
328
|
+
| File | Change |
|
|
329
|
+
|---|---|
|
|
330
|
+
| `managed_concept_collection.rb` | Replace Array/Hash with `ConceptStore`; remove `load_from_files`/`save_to_files` |
|
|
331
|
+
| `collection.rb` | Replace `@index` with `ConceptStore`; remove `load_concepts`/`save_concepts` |
|
|
332
|
+
| `gcr_package.rb` | Use `ConceptStore` with ZIP adapter; keep compiled format generation |
|
|
333
|
+
| `concept_manager.rb` | Simplify to delegate to `ConceptStore`; remove raw File I/O |
|
|
334
|
+
| `concept_collector.rb` | Replace directory scanning with `ConceptStore.new(...).all` |
|
|
335
|
+
| `glossarist.rb` | Add autoloads for new classes |
|
|
336
|
+
|
|
337
|
+
## Spec coverage needed
|
|
338
|
+
|
|
339
|
+
1. **ConceptStore CRUD** — save, fetch, update, delete for ManagedConcept
|
|
340
|
+
2. **Composite model storage** — LocalizedConcept stored independently
|
|
341
|
+
3. **Filesystem adapter** — reads/writes Glossarist's directory layout
|
|
342
|
+
4. **ZIP adapter** — round-trip through GCR packages
|
|
343
|
+
5. **Schema version handling** — v1, v2, v3 concepts load correctly
|
|
344
|
+
6. **Lazy loading** — concepts loaded on demand, not all at once
|
|
345
|
+
7. **Migration backward compatibility** — existing YAML files still readable
|
|
346
|
+
|
|
347
|
+
## Risks
|
|
348
|
+
|
|
349
|
+
- **High complexity:** Glossarist has v1/v2/v3 schema migration paths. The
|
|
350
|
+
storage layer must handle all versions transparently.
|
|
351
|
+
- **Medium risk:** ZIP adapter must handle streaming mode for large glossaries.
|
|
352
|
+
- **Low risk:** Replacing `ManagedConceptCollection`'s Array — straightforward
|
|
353
|
+
delegation.
|
|
354
|
+
|
|
355
|
+
## Dependencies
|
|
356
|
+
|
|
357
|
+
- **lutaml-store must first** support custom filesystem layouts and YAML
|
|
358
|
+
serialization natively. See `0-lutaml-store-self-quality.md`.
|
|
359
|
+
- **lutaml-store adapter interface** may need extension for ZIP archive support.
|