glossarist 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
  [![npm version](https://img.shields.io/npm/v/glossarist.svg)](https://www.npmjs.com/package/glossarist)
5
5
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
6
 
7
- JavaScript library for reading [Glossarist](https://github.com/glossarist) GCR packages (ZIP archives) and v2 glossarist concept data (YAML files). Works in Node.js and browsers.
7
+ JavaScript SDK for reading and writing [Glossarist](https://github.com/glossarist) GCR packages manages terminology concepts with rich domain models, bidirectional YAML serialization, validation, and cross-reference resolution.
8
8
 
9
9
  ## Install
10
10
 
@@ -12,105 +12,213 @@ JavaScript library for reading [Glossarist](https://github.com/glossarist) GCR p
12
12
  npm install glossarist
13
13
  ```
14
14
 
15
- Requires Node.js 18+.
15
+ Requires Node.js 20+.
16
16
 
17
17
  ## Usage
18
18
 
19
- ### Reading a GCR package
19
+ ### Read a GCR package
20
20
 
21
21
  ```js
22
22
  import { loadGcr } from 'glossarist';
23
- import fs from 'fs';
24
23
 
25
- const buf = fs.readFileSync('my-dataset.gcr');
26
- const pkg = await loadGcr(buf);
27
-
28
- // Metadata
24
+ const pkg = await loadGcr(fs.readFileSync('my-dataset.gcr'));
29
25
  const meta = await pkg.metadata();
30
- console.log(meta.shortname, meta.version, meta.concept_count);
31
-
32
- // List concept IDs
33
- const ids = await pkg.conceptIds();
34
-
35
- // Read a specific concept
36
- const concept = await pkg.concept('3.1.1.1');
37
- console.log(concept.termid);
38
- console.log(concept.localizations.eng.terms[0].designation);
39
26
 
40
- // Iterate all concepts (streaming)
27
+ // Stream concepts (memory-efficient for large datasets)
41
28
  await pkg.eachConcept((concept) => {
42
- console.log(concept.termid);
29
+ console.log(concept.id, concept.primaryDesignation('eng'));
43
30
  });
44
31
  ```
45
32
 
46
- `loadGcr` accepts `Buffer`, `ArrayBuffer`, `Uint8Array`, `Blob`, or a base64-encoded string.
47
-
48
- ### Reading concept YAML files from a directory
33
+ ### Read from a directory
49
34
 
50
35
  ```js
51
- import { readConcepts, readConcept, listConceptIds } from 'glossarist';
36
+ import { readConcepts, readRegister } from 'glossarist';
52
37
 
53
- // Read all concepts
54
38
  const concepts = readConcepts('./geolexica-v2/');
55
- console.log(`Loaded ${concepts.length} concepts`);
39
+ const register = readRegister('./geolexica-v2/');
40
+ ```
41
+
42
+ ### Write a GCR package
43
+
44
+ ```js
45
+ import { createGcr, ManagedConceptCollection, conceptParser } from 'glossarist';
46
+
47
+ const concept = conceptParser.parse(`
48
+ termid: "3.1.1.1"
49
+ eng:
50
+ terms:
51
+ - type: expression
52
+ designation: entity
53
+ definition:
54
+ - content: A concrete or abstract thing.
55
+ `);
56
+
57
+ const buf = await createGcr([concept], { shortname: 'test' });
58
+ fs.writeFileSync('out.gcr', buf);
59
+ ```
60
+
61
+ ### Domain model
62
+
63
+ Every domain entity is a class instance with `toJSON()`, `fromJSON()`, `equals()`, and `clone()`:
64
+
65
+ ```js
66
+ import { Concept, LocalizedConcept, Expression, DetailedDefinition } from 'glossarist/models';
67
+
68
+ const lc = new LocalizedConcept({
69
+ language_code: 'eng',
70
+ terms: [{ type: 'expression', designation: 'entity', normative_status: 'preferred' }],
71
+ definition: [{ content: 'A concrete or abstract thing.' }],
72
+ entry_status: 'valid',
73
+ });
74
+
75
+ const concept = new Concept({
76
+ id: '3.1.1.1',
77
+ localizations: { eng: lc.toJSON() },
78
+ });
79
+
80
+ console.log(concept.primaryDesignation('eng')); // 'entity'
81
+ console.log(concept.definition('eng')); // 'A concrete or abstract thing.'
56
82
 
57
- // Read a single concept by ID
58
- const concept = readConcept('./geolexica-v2/', '3.1.1.1');
83
+ // Round-trip invariant
84
+ const restored = Concept.fromJSON(concept.toJSON());
85
+ console.log(concept.equals(restored)); // true
86
+ ```
87
+
88
+ ### Validation
89
+
90
+ ```js
91
+ import { validateConcept, validateRegister, createConceptValidator, ValidationRule } from 'glossarist';
92
+
93
+ // Built-in rules: language codes, designation types, entry status
94
+ const result = validateConcept(concept);
95
+ if (!result.valid) {
96
+ for (const err of result.errors) {
97
+ console.log(`[${err.severity}] ${err.path}: ${err.message}`);
98
+ }
99
+ }
100
+
101
+ // Custom rules
102
+ class NoDuplicateTermsRule extends ValidationRule {
103
+ constructor() { super('no-duplicate-terms', 'warning'); }
104
+ validate(value, path) {
105
+ // check for duplicate designations...
106
+ }
107
+ }
108
+
109
+ const validator = createConceptValidator().addRule(new NoDuplicateTermsRule());
110
+ validator.validate(concept);
59
111
 
60
- // List IDs with optional prefix filter
61
- const ids = listConceptIds('./geolexica-v2/', '3.1.');
112
+ // Register validation
113
+ validateRegister({ schema_version: '1', shortname: 'my-dataset' });
62
114
  ```
63
115
 
64
- ### Browser usage
116
+ ### UUID generation
65
117
 
66
- The GCR reader works in browsers via jszip. The concept directory reader requires Node.js `fs`.
118
+ Deterministic UUID v5 matching the Ruby glossarist gem:
67
119
 
68
- ```html
69
- <script type="module">
70
- import { loadGcr } from 'glossarist/gcr';
120
+ ```js
121
+ import { conceptUuid, localizedConceptUuid } from 'glossarist';
71
122
 
72
- const response = await fetch('/datasets/isotc204.gcr');
73
- const buf = await response.arrayBuffer();
74
- const pkg = await loadGcr(buf);
75
- const meta = await pkg.metadata();
76
- </script>
123
+ conceptUuid('3.1.1.1'); // UUID v5 (stable across runs)
124
+ localizedConceptUuid('3.1.1.1', 'eng'); // different UUID v5
77
125
  ```
78
126
 
79
- ## Concept format
127
+ ### Reference resolution
80
128
 
81
- Glossarist-js normalizes both storage formats into a consistent structure:
129
+ Extract and resolve cross-references between concepts:
82
130
 
83
131
  ```js
84
- {
85
- termid: '3.1.1.1', // concept identifier
86
- term: 'entity', // primary term (canonical format only)
87
- localizations: {
88
- eng: {
89
- terms: [{ type: 'expression', designation: 'entity', normative_status: 'preferred' }],
90
- definition: [{ content: 'concrete or abstract thing...' }],
91
- notes: [],
92
- examples: [],
93
- sources: [{ type: 'authoritative', origin: { ref: 'ISO/TS 14812:2022' } }],
94
- entry_status: 'valid',
95
- },
96
- fra: { ... },
97
- },
98
- raw: { ... }, // original parsed YAML
132
+ import { referenceResolver } from 'glossarist';
133
+ import { ConceptCollection } from 'glossarist';
134
+
135
+ const collection = new ConceptCollection(allConcepts);
136
+
137
+ // Find all references in a concept
138
+ const refs = referenceResolver.extractReferences(concept);
139
+
140
+ // Resolve against a collection
141
+ const resolved = referenceResolver.resolveAll(concept, collection);
142
+ for (const [target, resolvedConcept] of resolved) {
143
+ if (!resolvedConcept) console.warn(`Broken reference: ${target}`);
99
144
  }
100
145
  ```
101
146
 
102
- Language codes are discovered dynamically from the YAML keys — any ISO 639-3 code works without code changes.
147
+ ### Managed collection lifecycle
148
+
149
+ ```js
150
+ import { ManagedConceptCollection, conceptParser } from 'glossarist';
151
+
152
+ const mcc = new ManagedConceptCollection();
103
153
 
104
- ### Supported formats
154
+ // Load from GCR
155
+ await mcc.loadFromGcr(fs.readFileSync('dataset.gcr'));
105
156
 
106
- | Format | Structure | Used by |
107
- |--------|-----------|---------|
108
- | **Canonical** | Single YAML document with `termid` and language keys (`eng:`, `fra:`) | IEV (iec-electropedia) |
109
- | **Managed concept** | Multi-document YAML: first doc has `data.identifier` + `data.localized_concepts`, subsequent docs have `data.language_code` | isotc204, isotc211, osgeo |
157
+ // Load from directory
158
+ mcc.loadFromDirectory('./concepts/');
159
+
160
+ // Add or replace a concept
161
+ mcc.add(newConcept);
162
+
163
+ // Save back
164
+ mcc.saveToDirectory('./out/');
165
+ const buf = await mcc.saveToGcr({ metadata: { shortname: 'test' } });
166
+ ```
167
+
168
+ ### V1 format migration
169
+
170
+ ```js
171
+ import { V1Reader, migrateV1ToV2 } from 'glossarist';
172
+
173
+ if (V1Reader.isV1Directory('./concepts-v1/')) {
174
+ const concepts = V1Reader.readAll('./concepts-v1/');
175
+ await migrateV1ToV2('./concepts-v1/', './concepts-v2/');
176
+ }
177
+ ```
178
+
179
+ ### Concept serialization
180
+
181
+ Serialize to canonical (single-doc) or managed (multi-doc) format:
182
+
183
+ ```js
184
+ import { conceptSerializer } from 'glossarist';
185
+
186
+ conceptSerializer.toCanonicalYaml(concept); // single YAML doc with termid + lang keys
187
+ conceptSerializer.toManagedYaml(concept); // multi-doc YAML with data.identifier
188
+ conceptSerializer.toYaml(concept); // auto-detect: uses term for canonical, id for managed
189
+ conceptSerializer.toRegisterYaml(register); // register.yaml format
190
+ ```
191
+
192
+ ## Sub-path exports
193
+
194
+ ```js
195
+ import 'glossarist'; // everything
196
+ import 'glossarist/gcr'; // browser-friendly GCR reader (no fs)
197
+ import 'glossarist/concept'; // Node.js filesystem reader
198
+ import 'glossarist/models'; // domain model classes
199
+ import 'glossarist/validators'; // validation framework
200
+ ```
201
+
202
+ ## Architecture
203
+
204
+ ```
205
+ Public API (index.js)
206
+ ├── Domain models → Concept, LocalizedConcept, Designation (Expression, Symbol, ...),
207
+ │ Citation, ConceptSource, RelatedConcept, DetailedDefinition, NonVerbRep
208
+ ├── Parsing → ConceptParser (canonical + managed format detection)
209
+ ├── Serialization → ConceptSerializer (canonical + managed YAML output)
210
+ ├── I/O → loadGcr, readConcepts, createGcr, writeConcepts
211
+ ├── Collections → ConceptCollection (Proxy-based, queryable), ManagedConceptCollection
212
+ ├── Validation → ConceptValidator, RegisterValidator, ValidationRule (pluggable)
213
+ ├── Utilities → conceptUuid, referenceResolver, V1Reader
214
+ └── Errors → GlossaristError, InvalidInputError, YamlParseError
215
+ ```
216
+
217
+ Models are pure — no I/O, serialization, or filesystem dependencies. Serialization formats are pluggable. Validation rules are pluggable.
110
218
 
111
219
  ## Error handling
112
220
 
113
- All public functions validate inputs and throw descriptive errors with context:
221
+ All public functions validate inputs and throw typed errors:
114
222
 
115
223
  ```js
116
224
  import { InvalidInputError, YamlParseError } from 'glossarist';
@@ -119,60 +227,26 @@ try {
119
227
  await pkg.concept('3.1.1.1');
120
228
  } catch (err) {
121
229
  if (err instanceof YamlParseError) {
122
- // err.message: "Failed to parse YAML for 3.1.1.1: ..."
123
- // err.cause: the original YAML parse error
230
+ // Malformed YAML err.cause chains the original error
231
+ // err.message includes the concept ID for easy location
124
232
  } else if (err instanceof InvalidInputError) {
125
- // Invalid input (null, empty string, wrong type)
233
+ // Null, empty, or wrong-type arguments
126
234
  }
127
235
  }
128
236
  ```
129
237
 
130
- Errors include the concept ID or filename in their message, making it easy to locate failures in large datasets.
131
-
132
- - **`GlossaristError`** — base class for all library errors
133
- - **`InvalidInputError`** — null, undefined, empty, or wrong-type arguments
134
- - **`YamlParseError`** — malformed YAML with `cause` chaining the original error
135
-
136
238
  ## TypeScript
137
239
 
138
240
  TypeScript declarations are included. No `@types/` package needed.
139
241
 
140
242
  ```ts
141
- import { loadGcr, readConcepts, type Concept, type GcrMetadata } from 'glossarist';
243
+ import { loadGcr, type Concept, type GcrMetadata } from 'glossarist';
244
+ import { Concept, LocalizedConcept, Designation } from 'glossarist/models';
142
245
 
143
246
  const pkg = await loadGcr(buffer);
144
247
  const meta: GcrMetadata | null = await pkg.metadata();
145
248
  ```
146
249
 
147
- ## API
148
-
149
- ### GCR Package (`glossarist/gcr`)
150
-
151
- - `loadGcr(input)` — Load a GCR ZIP from Buffer/ArrayBuffer/Uint8Array/Blob/base64 string. Returns `GcrPackage`.
152
- - `GcrPackage#metadata()` — Parse `metadata.yaml`.
153
- - `GcrPackage#register()` — Parse optional `register.yaml`.
154
- - `GcrPackage#conceptIds()` — Array of concept IDs (natural-sorted).
155
- - `GcrPackage#concept(id)` — Read and normalize a single concept.
156
- - `GcrPackage#eachConcept(callback)` — Stream all concepts.
157
- - `GcrPackage#allConcepts()` — Load all concepts into an array.
158
- - `parseConceptYaml(raw, context?)` — Parse raw YAML string into normalized concept object. `context` is an optional concept ID or filename for error messages.
159
- - `naturalSort(a, b)` — Natural sort comparator for concept IDs.
160
-
161
- ### Concept Directory Reader (`glossarist/concept`)
162
-
163
- Node.js only (uses `fs`).
164
-
165
- - `readConcepts(dir)` — Read all concept YAML files from a directory.
166
- - `readConcept(dir, id)` — Read a single concept by ID.
167
- - `listConceptIds(dir, prefix?)` — List concept IDs, optionally filtered by prefix.
168
- - `readRegister(dir)` — Read `register.yaml` if present.
169
-
170
- ### Errors
171
-
172
- - `GlossaristError` — base error class
173
- - `InvalidInputError` — bad input arguments
174
- - `YamlParseError` — YAML parse failures (has `cause`, includes concept context)
175
-
176
250
  ## Development
177
251
 
178
252
  ```bash
@@ -182,8 +256,6 @@ npm run lint # lint src/ and test/
182
256
  npm run test:coverage # run with coverage report
183
257
  ```
184
258
 
185
- See [CONTRIBUTING.md](./CONTRIBUTING.md) for full guidelines.
186
-
187
259
  ## License
188
260
 
189
261
  [MIT](./LICENSE)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "glossarist",
3
- "version": "0.1.3",
3
+ "version": "0.1.5",
4
4
  "description": "JavaScript SDK for Glossarist GCR packages — read, write, validate, and manage terminology concepts",
5
5
  "type": "module",
6
6
  "main": "src/index.js",
@@ -21,6 +21,10 @@ export class Citation extends GlossaristModel {
21
21
  }
22
22
  }
23
23
 
24
+ get isStructured() {
25
+ return typeof this.source === 'object' && this.source !== null;
26
+ }
27
+
24
28
  toString() {
25
29
  if (this.ref) return this.ref;
26
30
  if (typeof this.source === 'string') return this.source;
@@ -70,6 +70,7 @@ export class Citation extends GlossaristModel {
70
70
  readonly version: string | null;
71
71
  readonly clause: string | null;
72
72
  readonly link: string | null;
73
+ readonly isStructured: boolean;
73
74
  toString(): string;
74
75
  }
75
76