lutaml-store 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (110) hide show
  1. checksums.yaml +7 -0
  2. data/.github/workflows/main.yml +27 -0
  3. data/.gitignore +12 -0
  4. data/.rspec +3 -0
  5. data/.rubocop.yml +10 -0
  6. data/.rubocop_todo.yml +450 -0
  7. data/CLAUDE.md +57 -0
  8. data/CODE_OF_CONDUCT.md +132 -0
  9. data/CORRECTED_HTTP_CACHE_IMPLEMENTATION.md +209 -0
  10. data/CORRECTED_HTTP_CACHE_PLAN.md +164 -0
  11. data/Gemfile +15 -0
  12. data/Gemfile.lock +220 -0
  13. data/README.adoc +1430 -0
  14. data/Rakefile +12 -0
  15. data/TODO.impl/0-lutaml-store-self-quality.md +112 -0
  16. data/TODO.impl/1-lutaml-hal-migration.md +60 -0
  17. data/TODO.impl/2-glossarist-migration.md +359 -0
  18. data/TODO.impl/3-lutaml-jsonschema-migration.md +273 -0
  19. data/bin/console +11 -0
  20. data/bin/setup +8 -0
  21. data/demo/Gemfile +15 -0
  22. data/demo/Gemfile.lock +61 -0
  23. data/demo/README.adoc +301 -0
  24. data/demo/data/vcards/co/contact_10_thompson.data +1 -0
  25. data/demo/data/vcards/co/contact_10_thompson.meta +1 -0
  26. data/demo/data/vcards/co/contact_1_doe.data +1 -0
  27. data/demo/data/vcards/co/contact_1_doe.meta +1 -0
  28. data/demo/data/vcards/co/contact_2_smith.data +1 -0
  29. data/demo/data/vcards/co/contact_2_smith.meta +1 -0
  30. data/demo/data/vcards/co/contact_3_johnson.data +1 -0
  31. data/demo/data/vcards/co/contact_3_johnson.meta +1 -0
  32. data/demo/data/vcards/co/contact_4_garcia.data +1 -0
  33. data/demo/data/vcards/co/contact_4_garcia.meta +1 -0
  34. data/demo/data/vcards/co/contact_5_wilson.data +1 -0
  35. data/demo/data/vcards/co/contact_5_wilson.meta +1 -0
  36. data/demo/data/vcards/co/contact_6_brown.data +1 -0
  37. data/demo/data/vcards/co/contact_6_brown.meta +1 -0
  38. data/demo/data/vcards/co/contact_7_davis.data +1 -0
  39. data/demo/data/vcards/co/contact_7_davis.meta +1 -0
  40. data/demo/data/vcards/co/contact_8_anderson.data +1 -0
  41. data/demo/data/vcards/co/contact_8_anderson.meta +1 -0
  42. data/demo/data/vcards/co/contact_9_taylor.data +1 -0
  43. data/demo/data/vcards/co/contact_9_taylor.meta +1 -0
  44. data/demo/data/vcards.db +0 -0
  45. data/demo/pottery_class_demo.rb +164 -0
  46. data/demo/vcard_models.rb +140 -0
  47. data/demo/vcard_store_demo.rb +526 -0
  48. data/lib/lutaml/store/adapter/base.rb +65 -0
  49. data/lib/lutaml/store/adapter/filesystem.rb +288 -0
  50. data/lib/lutaml/store/adapter/memory.rb +225 -0
  51. data/lib/lutaml/store/adapter/sqlite.rb +193 -0
  52. data/lib/lutaml/store/adapter.rb +12 -0
  53. data/lib/lutaml/store/attribute_updater.rb +198 -0
  54. data/lib/lutaml/store/basic_store.rb +190 -0
  55. data/lib/lutaml/store/cache.rb +108 -0
  56. data/lib/lutaml/store/cache_store.rb +282 -0
  57. data/lib/lutaml/store/composite_model_handler.rb +169 -0
  58. data/lib/lutaml/store/compression.rb +137 -0
  59. data/lib/lutaml/store/config.rb +178 -0
  60. data/lib/lutaml/store/database_store.rb +425 -0
  61. data/lib/lutaml/store/events.rb +92 -0
  62. data/lib/lutaml/store/format/base.rb +33 -0
  63. data/lib/lutaml/store/format/json.rb +25 -0
  64. data/lib/lutaml/store/format/jsonl.rb +37 -0
  65. data/lib/lutaml/store/format/marshal_format.rb +37 -0
  66. data/lib/lutaml/store/format/yaml.rb +29 -0
  67. data/lib/lutaml/store/format/yamls.rb +35 -0
  68. data/lib/lutaml/store/format.rb +33 -0
  69. data/lib/lutaml/store/http_cache.rb +279 -0
  70. data/lib/lutaml/store/http_cache_config.rb +53 -0
  71. data/lib/lutaml/store/http_cache_entry.rb +69 -0
  72. data/lib/lutaml/store/http_header_processor.rb +175 -0
  73. data/lib/lutaml/store/integrity.rb +102 -0
  74. data/lib/lutaml/store/model_registration.rb +75 -0
  75. data/lib/lutaml/store/model_registry.rb +123 -0
  76. data/lib/lutaml/store/model_serializer.rb +69 -0
  77. data/lib/lutaml/store/monitor.rb +192 -0
  78. data/lib/lutaml/store/storage_key.rb +40 -0
  79. data/lib/lutaml/store/version.rb +7 -0
  80. data/lib/lutaml/store.rb +41 -0
  81. data/lutaml-store.gemspec +35 -0
  82. data/plan.adoc +606 -0
  83. data/sig/lutaml/store.rbs +6 -0
  84. data/spec/lutaml/store/adapter_interface_spec.rb +89 -0
  85. data/spec/lutaml/store/anti_pattern_guard_spec.rb +35 -0
  86. data/spec/lutaml/store/anti_pattern_spec.rb +78 -0
  87. data/spec/lutaml/store/autoload_spec.rb +34 -0
  88. data/spec/lutaml/store/cache_store_spec.rb +271 -0
  89. data/spec/lutaml/store/compression_spec.rb +78 -0
  90. data/spec/lutaml/store/config_enhanced_spec.rb +158 -0
  91. data/spec/lutaml/store/corrected_http_cache_integration_spec.rb +336 -0
  92. data/spec/lutaml/store/custom_serializer_spec.rb +108 -0
  93. data/spec/lutaml/store/database_store_spec.rb +279 -0
  94. data/spec/lutaml/store/file_io_spec.rb +219 -0
  95. data/spec/lutaml/store/format_round_trip_spec.rb +110 -0
  96. data/spec/lutaml/store/format_spec.rb +70 -0
  97. data/spec/lutaml/store/http_cache_entry_spec.rb +203 -0
  98. data/spec/lutaml/store/http_cache_hal_integration_spec.rb +404 -0
  99. data/spec/lutaml/store/http_cache_spec.rb +422 -0
  100. data/spec/lutaml/store/http_header_processor_spec.rb +290 -0
  101. data/spec/lutaml/store/import_spec.rb +90 -0
  102. data/spec/lutaml/store/integrity_spec.rb +157 -0
  103. data/spec/lutaml/store/key_collision_serializer_spec.rb +98 -0
  104. data/spec/lutaml/store/load_save_spec.rb +107 -0
  105. data/spec/lutaml/store/lutaml_model_integration_spec.rb +291 -0
  106. data/spec/lutaml/store/model_serializer_spec.rb +140 -0
  107. data/spec/lutaml/store/store_spec.rb +182 -0
  108. data/spec/lutaml/store_spec.rb +21 -0
  109. data/spec/spec_helper.rb +16 -0
  110. metadata +166 -0
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,112 @@
1
+ # TODO.impl/0-lutaml-store-self-quality.md
2
+
3
+ # lutaml-store Internal Quality Fixes (prerequisite for migrations)
4
+
5
+ ## Completed
6
+
7
+ All anti-patterns eliminated from `lib/`:
8
+
9
+ | Anti-pattern | Remaining in lib/ |
10
+ |---|---|
11
+ | `instance_variable_get/set` | 0 (only a comment reference) |
12
+ | `respond_to?` | 0 (only a comment reference) |
13
+ | `send` on private methods | 0 |
14
+
15
+ ### Changes made
16
+
17
+ 1. **Created `ModelSerializer`** — unified serialization/deserialization into one class, eliminating `respond_to?` chains and duplicated code across `DatabaseStore`, `CompositeModelHandler`, `Serializer`.
18
+ 2. **Added `Store#emit_event`** public method — eliminated `instance_variable_get(:@events)`.
19
+ 3. **Fixed `AttributeUpdater`** — uses proper constructors instead of `instance_variable_set/get`.
20
+ 4. **Fixed `CacheStore`** — uses proper factory methods.
21
+ 5. **Removed `respond_to?` from adapters** — calls methods directly on `Adapter::Base` interface.
22
+ 6. **Removed `CacheInspector`** — unused code.
23
+ 7. **Added custom serializer support** — `ModelRegistration` accepts `serializer:` option for models with non-standard serialization (e.g., glossarist's `key_value` DSL).
24
+
25
+ ## Anti-pattern audit (current state)
26
+
27
+ ### 1. `instance_variable_get` / `instance_variable_set` — breaks encapsulation
28
+
29
+ | File | Line | Usage |
30
+ |---|---|---|
31
+ | `database_store.rb` | 375 | `@store.instance_variable_get(:@events)&.emit(event, data)` |
32
+ | `model_store.rb` | 230 | Same pattern — reaching into `@store`'s internals |
33
+ | `attribute_updater.rb` | 228 | `model.instance_variable_set(var, upgraded_model.instance_variable_get(var))` — polymorphic upgrade hack |
34
+ | `cache_store.rb` | 45 | `entry.instance_variable_set(:@created_at, ...)` — bypassing constructor |
35
+
36
+ **Fix:** Expose proper public APIs on the objects being accessed:
37
+ - `Store` should expose `emit_event(event, data)` as a public method.
38
+ - `AttributeUpdater#try_polymorphic_upgrade` must use proper constructors or
39
+ `Lutaml::Model::Serializable`'s own API — never copy instance variables.
40
+ - `CacheStore` should use factory methods or proper attribute setters.
41
+
42
+ ### 2. `respond_to?` — poor typing / duck-typing smell
43
+
44
+ **24 occurrences across the codebase.** The worst offenders:
45
+
46
+ | File | Pattern | Fix |
47
+ |---|---|---|
48
+ | `composite_model_handler.rb` L102-133 | `respond_to?(:to_hash)`, `respond_to?(:from_hash)` | All models are `Lutaml::Model::Serializable` — use `is_a?` checks against a serialization protocol, or better, use `to_hash` directly via the type system |
49
+ | `database_store.rb` L288-327 | Same serialization dispatch via `respond_to?` | Extract a `SerializationAdapter` that knows how to (de)serialize `Lutaml::Model::Serializable` |
50
+ | `attribute_updater.rb` L87-129 | `respond_to?(setter_method)` | Use the model's attribute metadata from `Lutaml::Model::Serializable.attributes` |
51
+ | `serializer.rb` | 14 occurrences | Full rewrite needed — see below |
52
+ | `http_cache.rb` L112-142 | `@adapter.respond_to?(:clear)` | All adapters inherit from `Adapter::Base` which defines `#clear` — just call it |
53
+
54
+ **Fix strategy:**
55
+ - Create a `Serializable` protocol module (`Lutaml::Store::Serializable`) that
56
+ formalizes the serialization contract (`to_store_hash`, `from_store_hash`).
57
+ - Replace all `respond_to?` with type-based dispatch or method calls on known base classes.
58
+ - For attribute validation, use `model.class.attributes` (from Lutaml::Model).
59
+
60
+ ### 3. Duplicated serialization logic
61
+
62
+ `DatabaseStore#serialize_model`, `DatabaseStore#deserialize_model`,
63
+ `CompositeModelHandler#serialize_model`, `CompositeModelHandler#deserialize_model`
64
+ all contain identical `respond_to?` chains for `(to_hash|to_h|to_s)` and
65
+ `(from_hash|from_h|new)`.
66
+
67
+ **Fix:** Extract a single `Lutaml::Store::ModelSerializer` class:
68
+
69
+ ```ruby
70
+ class ModelSerializer
71
+ def serialize(model)
72
+ model.to_hash.merge("_class" => model.class.name)
73
+ end
74
+
75
+ def deserialize(data, expected_class)
76
+ klass = Object.const_get(data["_class"])
77
+ klass.from_hash(data.except("_class", "_composite_models"))
78
+ end
79
+ end
80
+ ```
81
+
82
+ Since all registered models are `Lutaml::Model::Serializable`, they all have
83
+ `to_hash` and `from_hash`. No need for duck-typing fallback chains.
84
+
85
+ ### 4. Event emission through encapsulation violation
86
+
87
+ Both `DatabaseStore` and `ModelStore` emit events via:
88
+ ```ruby
89
+ @store.instance_variable_get(:@events)&.emit(event, data)
90
+ ```
91
+
92
+ **Fix:** Add `Store#emit_event(event, data)` as a public method. The `events`
93
+ object is an internal implementation detail and should not be leaked.
94
+
95
+ ### 5. `CacheInspector` — untested, unused?
96
+
97
+ Check if `cache_inspector.rb` is actually used anywhere. If not, remove it.
98
+
99
+ ### 6. Open/closed principle violations
100
+
101
+ - `AttributeUpdater#try_polymorphic_upgrade` uses `instance_variable_set` and
102
+ `model.extend(registered_class)` — this is metaprogramming that breaks OCP.
103
+ Instead, create a new model instance of the subclass and replace the reference.
104
+
105
+ ## Implementation order
106
+
107
+ 1. **Add `Store#emit_event` public method** — eliminates `instance_variable_get(:@events)`.
108
+ 2. **Create `ModelSerializer`** — unify all serialization/deserialization into one class. Eliminates `respond_to?` chains and duplicated code.
109
+ 3. **Fix `AttributeUpdater`** — remove `try_polymorphic_upgrade`'s `instance_variable_set/get`. Use proper factory pattern.
110
+ 4. **Fix `CacheStore`** — use proper constructor/factory instead of `instance_variable_set`.
111
+ 5. **Remove `respond_to?` from adapters** — call methods directly on `Adapter::Base` interface.
112
+ 6. **Audit specs** — add specs that verify no `send`, no `instance_variable_get/set`, no `respond_to?` regressions.
@@ -0,0 +1,60 @@
1
+ # TODO.impl/1-lutaml-hal-migration.md
2
+
3
+ # Migration Plan: lutaml-hal → lutaml-store
4
+
5
+ ## Completed
6
+
7
+ ### Anti-pattern elimination (all done)
8
+
9
+ All `instance_variable_set/get`, `send(:private_method)`, and `respond_to?` calls
10
+ have been eliminated from `lib/lutaml/hal/`.
11
+
12
+ | File | Changes |
13
+ |---|---|
14
+ | `resource.rb` | Added `embedded_data=` setter; `from_embedded` uses `public_send` instead of `instance_variable_set` |
15
+ | `page.rb` | `instance_variable_get` → `public_send(Hal::REGISTER_ID_ATTR_NAME)` |
16
+ | `model_register.rb` | All methods used by Link made public; `instance_variable_set` → `embedded_data=`; `instance_variable_set` for register ID → `public_send(setter)`; `send(key)` → `public_send(key)`; `respond_to?(:status)` → `faraday_response?(response) && response.status == 304` |
17
+ | `link.rb` | Removed all `register.send(:private_method, ...)` calls (4 instances); `instance_variable_get` → `public_send`; `respond_to?(:embedded_data)` → `is_a?(Resource)` type check |
18
+ | `global_register.rb` | Removed `respond_to?` checks from `clear_all_caches` and `cache_stats` |
19
+ | `cache/cache_metadata.rb` | Extracted `ResponseHeaders` and `ResponseStatus` modules replacing `respond_to?` proc lambdas |
20
+ | `cache/cache_manager.rb` | `respond_to?` → `rescue NoMethodError` for optional methods; `is_a?` type check for HttpCache; proper HttpCache API delegation (`get(:get, url)`, `set(:get, url, {}, response)`) |
21
+ | `rate_limiter.rb` | Simplified with `is_a?` type check |
22
+
23
+ ### Cache architecture fixes
24
+
25
+ - `http_aware?` now requires explicit opt-in (`http_aware == true`) instead of defaulting to true when HttpCache is available
26
+ - `CacheManager` properly delegates to `HttpCache`'s method/url API (`get(:get, url)`, `set(:get, url, {}, response)`, `delete(:get, url)`)
27
+ - Non-http-aware path uses `SimpleCacheStore` (in-memory, no serialization) instead of `CacheStore` (which JSON-serializes and can't handle `CacheEntry` objects)
28
+ - Removed `create_basic_cache` method (no longer needed)
29
+
30
+ ### Spec fixes
31
+
32
+ - `cache_integration_spec.rb`: Fixed `REGISTER_ID_ATTR_NAME` (requires `lutaml/hal` instead of redefining constant); added `status: 200` to mock responses; fixed cache key in legacy test; removed `instance_variable_get`/`send` usage
33
+ - `cache_manager_spec.rb`: Updated stats tests to stub `stats` (not `cache_info`); updated `http_aware_cache?` test to use `is_a?` instead of `respond_to?`; fixed `get_from_http_cache` test signature
34
+ - `cache_configuration_spec.rb`: Fixed `http_aware?` nil default expectation; fixed adapter_config error type
35
+ - `cache_metadata_spec.rb`: Added `status: 200` to test doubles
36
+
37
+ ### Test results
38
+
39
+ - **lutaml-hal**: 210 examples, 0 failures
40
+ - **lutaml-store**: 248 examples, 1 pre-existing failure (vary header spec) + 25 pre-existing failures in HTTP cache integration specs
41
+ - **Anti-patterns in lib code**: 0 `instance_variable_set/get`, 0 `send`, 0 `respond_to?`
42
+
43
+ ## Remaining (future work)
44
+
45
+ ### Phase 1: Unify caching through lutaml-store
46
+
47
+ Create `HalStore` as a thin wrapper around `Lutaml::Store` for HAL-specific
48
+ caching. Remove `SimpleCacheStore` (replaced by lutaml-store memory adapter).
49
+ Remove `Client`'s `@cache` Hash. Simplify `CacheManager`.
50
+
51
+ ### Phase 2: Register ID tracking as proper attribute
52
+
53
+ Replace `_global_register_id` instance variable with a proper
54
+ `Lutaml::Model::Serializable` attribute on `Resource`. Make embedded data a
55
+ proper attribute. Eliminates `Hal::REGISTER_ID_ATTR_NAME` constant.
56
+
57
+ ### Phase 3: Store HAL resources as registered models
58
+
59
+ Register HAL resource classes with lutaml-store's model registry for direct
60
+ CRUD operations instead of wrapping them in `CacheEntry`.
@@ -0,0 +1,359 @@
1
+ # TODO.impl/2-glossarist-migration.md
2
+
3
+ # Migration Plan: glossarist-ruby → lutaml-store
4
+
5
+ ## Current state analysis
6
+
7
+ ### How glossarist stores LutaML model objects
8
+
9
+ Glossarist is the most complex of the three repos. It manages glossary/terminology
10
+ concepts with multiple versions (v1, v2, v3) and several storage mechanisms:
11
+
12
+ 1. **Filesystem YAML persistence** — `ConceptManager` reads/writes concept YAML
13
+ files from a directory structure. Concepts are stored as:
14
+ - `concept/{uuid}.yaml` — the managed concept
15
+ - `localized_concept/{uuid}.yaml` — per-language localizations
16
+ - Grouped files: `{uuid}.yaml` containing concept + all localizations
17
+
18
+ 2. **ZIP package (GCR format)** — `GcrPackage` reads/writes concepts from ZIP
19
+ archives containing `metadata.yaml`, `concepts/*.yaml`, optional compiled
20
+ formats (TBX, JSON-LD, Turtle), and dataset assets.
21
+
22
+ 3. **In-memory collections** — `ManagedConceptCollection`, `Collection`,
23
+ `Collections::Collection`, `Collections::TypedCollection` — all use plain
24
+ Arrays/Hashes to hold models in memory.
25
+
26
+ ### Architecture (current)
27
+
28
+ ```
29
+ Glossarist module
30
+ ├── Collection (v1)
31
+ │ ├── @index (Hash id → Concept)
32
+ │ ├── @path (String — filesystem path)
33
+ │ ├── load_concepts → Dir.glob → Concept.from_yaml
34
+ │ └── save_concepts → File.write(Psych.dump)
35
+
36
+ ├── ManagedConceptCollection (v2/v3)
37
+ │ ├── @managed_concepts (Array)
38
+ │ ├── @managed_concepts_ids (Hash id → uuid)
39
+ │ ├── load_from_files → ConceptManager
40
+ │ └── save_to_files → ConceptManager
41
+
42
+ ├── ConceptManager
43
+ │ ├── path, localized_concepts_path
44
+ │ ├── load_from_files → Dir.glob → ConceptDocument.from_yamls → ManagedConcept
45
+ │ ├── save_to_files → File.write(to_yaml)
46
+ │ ├── save_grouped_concepts_to_files
47
+ │ └── Versioned concept document classes (v2, v3)
48
+
49
+ ├── GcrPackage
50
+ │ ├── write → Zip::File → concept YAMLs + metadata
51
+ │ ├── read → Zip::File → parse → ManagedConcept instances
52
+ │ └── Compiled format generation (TBX, JSON-LD, Turtle)
53
+
54
+ ├── ConceptCollector
55
+ │ └── Static methods for scanning directories, detecting schema versions
56
+
57
+ └── Model classes (all Lutaml::Model::Serializable):
58
+ ├── ManagedConcept (key: identifier/uuid)
59
+ ├── LocalizedConcept (inherits Concept)
60
+ ├── ConceptData
61
+ ├── Designation::Base (polymorphic)
62
+ ├── DetailedDefinition
63
+ ├── ConceptSource
64
+ └── ... many more
65
+ ```
66
+
67
+ ### Key observations
68
+
69
+ **Glossarist has no storage abstraction.** File I/O is scattered across:
70
+ - `Collection#load_concepts`, `Collection#save_concept_to_file`
71
+ - `ConceptManager#load_concept_from_file`, `ConceptManager#save_concept_to_file`
72
+ - `GcrPackage#write`, `GcrPackage#read`
73
+ - `ConceptCollector` (static methods doing Dir.glob)
74
+
75
+ **Every storage operation is ad-hoc:**
76
+ - YAML serialization uses `Lutaml::Model::Serializable#to_yaml` / `.from_yaml`
77
+ - File naming uses concept UUIDs
78
+ - Directory layout is version-dependent
79
+ - ZIP packaging duplicates the filesystem logic
80
+
81
+ ### Problems identified
82
+
83
+ | Category | Issue | Location |
84
+ |---|---|---|
85
+ | **No storage abstraction** | File I/O is in `ConceptManager`, `Collection`, `GcrPackage`, `ConceptCollector` — no single persistence layer | Multiple |
86
+ | **MECE violation** | `ManagedConceptCollection` manages an Array + a separate id→uuid Hash — this is a database, but hand-rolled | `managed_concept_collection.rb` |
87
+ | **DRY violation** | YAML file I/O patterns repeated in `Collection`, `ConceptManager`, `GcrPackage` | Multiple |
88
+ | **Schema version branching** | Version detection (v1/v2/v3) is sprinkled through `ConceptCollector`, `ConceptManager`, `SchemaMigration` | Multiple |
89
+ | **Lazy loading missing** | `Collection` has a TODO: "Add support for lazy concept loading" — all concepts loaded eagerly into memory | `collection.rb:4` |
90
+ | **OCP violation** | Adding a new storage backend (e.g., database) requires modifying `ConceptManager`, `Collection`, and `GcrPackage` | Multiple |
91
+
92
+ ## Migration strategy
93
+
94
+ ### Phase 1: Define a `ConceptStore` interface backed by lutaml-store
95
+
96
+ The core insight: Glossarist's concept management is a CRUD store with:
97
+ - **Model:** `ManagedConcept` (key: `uuid`)
98
+ - **Composite models:** `LocalizedConcept` instances per language
99
+ - **Backends:** Filesystem YAML, ZIP archive, and (future) SQLite
100
+
101
+ ```ruby
102
+ module Glossarist
103
+ class ConceptStore
104
+ def initialize(adapter:, schema_version: "3")
105
+ @store = Lutaml::Store.new(
106
+ adapter: adapter,
107
+ models: [
108
+ {
109
+ model: Glossarist::ManagedConcept,
110
+ key: :uuid,
111
+ polymorphic_class_key: nil
112
+ },
113
+ {
114
+ model: Glossarist::LocalizedConcept,
115
+ key: :uuid
116
+ }
117
+ ]
118
+ )
119
+ @schema_version = schema_version
120
+ end
121
+
122
+ # CRUD operations
123
+ def save(managed_concept) = @store.save(managed_concept)
124
+ def fetch(uuid) = @store.fetch(model: ManagedConcept, uuid: uuid)
125
+ def fetch_by_id(id) = where(model: ManagedConcept, identifier: id).first
126
+ def update(uuid, **attrs) = @store.update(model: ManagedConcept, uuid: uuid, attributes: attrs)
127
+ def delete(uuid) = @store.destroy(model: ManagedConcept, uuid: uuid)
128
+
129
+ # Query
130
+ def all = @store.all(model: ManagedConcept)
131
+ def where(model:, **conditions) = @store.where(model: model, **conditions)
132
+ def count = @store.count(model: ManagedConcept)
133
+
134
+ # Composite model access
135
+ def fetch_localized(managed_concept_uuid, lang)
136
+ concept = fetch(managed_concept_uuid)
137
+ concept&.localization(lang)
138
+ end
139
+ end
140
+ end
141
+ ```
142
+
143
+ ### Phase 2: Implement filesystem adapter for Glossarist
144
+
145
+ Glossarist's filesystem layout is specific:
146
+
147
+ ```
148
+ concepts/
149
+ concept/
150
+ {uuid}.yaml
151
+ localized_concept/
152
+ {uuid}.yaml
153
+ ```
154
+
155
+ This maps to lutaml-store's FileSystem adapter with custom path resolution:
156
+
157
+ ```ruby
158
+ store = Glossarist::ConceptStore.new(
159
+ adapter: {
160
+ type: :filesystem,
161
+ path: "/path/to/concepts",
162
+ extension: "yaml",
163
+ # Custom naming: use uuid as filename, organize by model type
164
+ naming_strategy: :uuid,
165
+ directory_layout: {
166
+ ManagedConcept => "concept",
167
+ LocalizedConcept => "localized_concept"
168
+ }
169
+ }
170
+ )
171
+ ```
172
+
173
+ This requires extending lutaml-store's FileSystem adapter to support:
174
+ - **Custom directory layout** (model type → subdirectory)
175
+ - **Custom key-to-filename mapping** (UUID-based naming)
176
+ - **YAML serialization** (already supported by lutaml-model)
177
+
178
+ Alternatively, add a `Glossarist::Adapters::YamlFilesystem` adapter that
179
+ implements `Lutaml::Store::Adapter::Base`.
180
+
181
+ ### Phase 3: Implement ZIP archive adapter
182
+
183
+ GCR packages are ZIP archives. Create a `Glossarist::Adapters::ZipArchive`
184
+ adapter:
185
+
186
+ ```ruby
187
+ class ZipArchive < Lutaml::Store::Adapter::Base
188
+ def initialize(config)
189
+ @zip_path = config[:path]
190
+ # ...
191
+ end
192
+
193
+ def save(key, data, metadata = {})
194
+ Zip::File.open(@zip_path, create: true) do |zf|
195
+ zf.get_output_stream(key_to_entry_name(key)) do |f|
196
+ f.write(data.to_yaml)
197
+ end
198
+ end
199
+ end
200
+
201
+ def load(key)
202
+ Zip::File.open(@zip_path) do |zf|
203
+ entry = zf.find_entry(key_to_entry_name(key))
204
+ entry&.get_input_stream&.read
205
+ end
206
+ end
207
+ # ... implement remaining Adapter::Base methods
208
+ end
209
+ ```
210
+
211
+ This makes `GcrPackage` a thin wrapper around `ConceptStore` with a ZIP adapter.
212
+
213
+ ### Phase 4: Migrate `ManagedConceptCollection`
214
+
215
+ Replace the hand-rolled Array + Hash index with `ConceptStore`:
216
+
217
+ ```ruby
218
+ class ManagedConceptCollection
219
+ def initialize(store:)
220
+ @store = store # Glossarist::ConceptStore
221
+ end
222
+
223
+ def fetch(uuid) = @store.fetch(uuid)
224
+ def store(concept) = @store.save(concept)
225
+ def each(&block) = @store.all.each(&block)
226
+
227
+ # Remove: @managed_concepts array, @managed_concepts_ids hash
228
+ # Remove: load_from_files, save_to_files (delegate to store)
229
+ end
230
+ ```
231
+
232
+ ### Phase 5: Migrate `Collection` (v1)
233
+
234
+ The v1 `Collection` follows the same pattern but simpler:
235
+
236
+ ```ruby
237
+ class Collection
238
+ def initialize(store:)
239
+ @store = store
240
+ end
241
+
242
+ def fetch(id) = @store.fetch_by_id(id)
243
+ def store(concept) = @store.save(concept)
244
+ # Remove: @index, @path, load_concepts, save_concepts
245
+ end
246
+ ```
247
+
248
+ ### Phase 6: Migrate `GcrPackage`
249
+
250
+ `GcrPackage` becomes a factory that creates a `ConceptStore` with a ZIP adapter:
251
+
252
+ ```ruby
253
+ class GcrPackage
254
+ def self.load(zip_path)
255
+ store = ConceptStore.new(adapter: { type: :zip, path: zip_path })
256
+ metadata = load_metadata(zip_path)
257
+ concepts = store.all
258
+ new(zip_path, metadata, concepts)
259
+ end
260
+
261
+ def self.create(concepts:, metadata:, output_path:, **opts)
262
+ store = ConceptStore.new(adapter: { type: :zip, path: output_path })
263
+ store.save(concepts)
264
+ write_metadata(output_path, metadata)
265
+ # Compiled format generation stays here — it's a presentation concern
266
+ end
267
+ end
268
+ ```
269
+
270
+ ### Phase 7: Migrate `ConceptCollector`
271
+
272
+ `ConceptCollector` currently has 230 lines of directory-scanning logic.
273
+ With lutaml-store, most of this becomes:
274
+
275
+ ```ruby
276
+ def collect(dir)
277
+ store = ConceptStore.new(adapter: { type: :filesystem, path: dir })
278
+ store.all
279
+ end
280
+ ```
281
+
282
+ Version detection logic moves into the adapter layer.
283
+
284
+ ## Completed
285
+
286
+ ### Phase 1: ConceptStore with custom serializer (done)
287
+
288
+ Created `Glossarist::ConceptStore` backed by `Lutaml::Store` with full CRUD.
289
+
290
+ **Problem:** `ManagedConcept` uses `key_value` DSL that maps both `uuid` and
291
+ `identifier` to the same `"id"` hash key — lossy for storage.
292
+
293
+ **Solution:** Added pluggable serializer support to lutaml-store:
294
+ - `ModelRegistration` now accepts `serializer:` option
295
+ - `ModelSerializer` delegates to custom serializer when registered
296
+ - Created `Glossarist::ConceptSerializer` — attribute-based serialization that
297
+ uses attribute names as hash keys (bypasses `key_value` mappings entirely)
298
+
299
+ | File | Change |
300
+ |---|---|
301
+ | `lutaml-store/lib/lutaml/store/model_registration.rb` | Added `serializer` attr_reader and option |
302
+ | `lutaml-store/lib/lutaml/store/model_serializer.rb` | Accepts `registration` param, delegates to custom serializer |
303
+ | `lutaml-store/lib/lutaml/store/database_store.rb` | Passes registration to serialize/deserialize calls |
304
+ | `lutaml-store/spec/lutaml/store/custom_serializer_spec.rb` | 4 specs for custom serializer feature |
305
+ | `glossarist/lib/glossarist/concept_serializer.rb` | Attribute-based serializer for ManagedConcept |
306
+ | `glossarist/lib/glossarist/concept_store.rb` | CRUD store backed by lutaml-store with custom serializer |
307
+ | `glossarist/lib/glossarist.rb` | Autoloads for ConceptSerializer, ConceptStore |
308
+ | `glossarist/Gemfile` | Fixed lutaml-store path (`../../lutaml/lutaml-store`) |
309
+ | `glossarist/glossarist.gemspec` | Added `lutaml-store ~> 0.1.0` dependency |
310
+ | `glossarist/spec/unit/concept_serializer_spec.rb` | 4 specs for serializer round-trip |
311
+ | `glossarist/spec/unit/concept_store_spec.rb` | 12 specs for full CRUD |
312
+
313
+ ### Test results
314
+
315
+ - **lutaml-store:** 19 core specs pass (database_store + custom_serializer)
316
+ - **glossarist:** 1154 examples, 0 failures (no regressions)
317
+
318
+ ## Remaining (future work)
319
+
320
+ | File | Purpose |
321
+ |---|---|
322
+ | `lib/glossarist/concept_store.rb` | Glossarist-specific store backed by `Lutaml::Store` |
323
+ | `lib/glossarist/adapters/yaml_filesystem.rb` | Custom filesystem adapter for Glossarist's YAML layout |
324
+ | `lib/glossarist/adapters/zip_archive.rb` | ZIP archive adapter for GCR packages |
325
+
326
+ ## Files to modify
327
+
328
+ | File | Change |
329
+ |---|---|
330
+ | `managed_concept_collection.rb` | Replace Array/Hash with `ConceptStore`; remove `load_from_files`/`save_to_files` |
331
+ | `collection.rb` | Replace `@index` with `ConceptStore`; remove `load_concepts`/`save_concepts` |
332
+ | `gcr_package.rb` | Use `ConceptStore` with ZIP adapter; keep compiled format generation |
333
+ | `concept_manager.rb` | Simplify to delegate to `ConceptStore`; remove raw File I/O |
334
+ | `concept_collector.rb` | Replace directory scanning with `ConceptStore.new(...).all` |
335
+ | `glossarist.rb` | Add autoloads for new classes |
336
+
337
+ ## Spec coverage needed
338
+
339
+ 1. **ConceptStore CRUD** — save, fetch, update, delete for ManagedConcept
340
+ 2. **Composite model storage** — LocalizedConcept stored independently
341
+ 3. **Filesystem adapter** — reads/writes Glossarist's directory layout
342
+ 4. **ZIP adapter** — round-trip through GCR packages
343
+ 5. **Schema version handling** — v1, v2, v3 concepts load correctly
344
+ 6. **Lazy loading** — concepts loaded on demand, not all at once
345
+ 7. **Migration backward compatibility** — existing YAML files still readable
346
+
347
+ ## Risks
348
+
349
+ - **High complexity:** Glossarist has v1/v2/v3 schema migration paths. The
350
+ storage layer must handle all versions transparently.
351
+ - **Medium risk:** ZIP adapter must handle streaming mode for large glossaries.
352
+ - **Low risk:** Replacing `ManagedConceptCollection`'s Array — straightforward
353
+ delegation.
354
+
355
+ ## Dependencies
356
+
357
+ - **lutaml-store must first** support custom filesystem layouts and YAML
358
+ serialization natively. See `0-lutaml-store-self-quality.md`.
359
+ - **lutaml-store adapter interface** may need extension for ZIP archive support.