relaton-index 0.2.19 → 0.2.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0e0ad0db979d523bfd05a561b5f81ba0aab46f41f02d524c804c4767fa50e57b
4
- data.tar.gz: f9e2c9fc9bdc73f8ae55cc5badd54190fbb0d2bef812c16c42ff6a4ed126958f
3
+ metadata.gz: 84483e9b27f8e48c618e2b6b7d124e618deebba9fa2ffb34d234027c3738ad32
4
+ data.tar.gz: '084e50ecfa029fb812e9552189ef795a901d5683b2f79baf643d55da0af93a78'
5
5
  SHA512:
6
- metadata.gz: d53ca914ea49bc69a1c80833ad7996a6bde8b5c583dfd7b8ac4aa858e5a3bb5031050271adafbbb58a511f987769e4133b5bbac0479e5455a2c36b0a708031a2
7
- data.tar.gz: e01cdbea3d72c1c76d48a735adba13af8af34f9c3cdd994de8532a8c8eef815548396c9d254a7824a1d7a83ce3d38ab41ca61602f851479f6111c35b035bbacf
6
+ metadata.gz: 198b94038473955e9c03f9e21b2e5876e9d36e94f72e11aec80d78350d988be5332cbbaceacf3e4253281b107afc4be370b66b89000df9d42f5dbdb17e5d8016
7
+ data.tar.gz: 38690e3090d278e9204200afefceac74eef9a4b5b40509fa2bea89d5c24053256c0f97c9c678f7642ea398caeb88b9710b7d272b8e55afc1bc6c1ca9d2a23fa1
data/CLAUDE.md ADDED
@@ -0,0 +1,64 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ relaton-index is a Ruby gem that provides indexing and searching of Relaton document references. It maps document identifiers to file paths, supporting both local index creation (for publishing) and remote index consumption (downloading from URLs with 24-hour caching).
8
+
9
+ ## Commands
10
+
11
+ ```bash
12
+ # Run all tests (default rake task)
13
+ rake spec
14
+
15
+ # Run linting
16
+ rake rubocop
17
+
18
+ # Run specific test file
19
+ bundle exec rspec spec/relaton/type_spec.rb
20
+
21
+ # Run specific test by name
22
+ bundle exec rspec spec/relaton/file_io_spec.rb -e "fetch_and_save"
23
+
24
+ # Install dependencies
25
+ bin/setup
26
+
27
+ # Interactive console
28
+ bin/console
29
+ ```
30
+
31
+ ## Architecture
32
+
33
+ ### Core Classes (all under `Relaton::Index` module in `lib/relaton/index/`)
34
+
35
+ - **`Relaton::Index`** (module, `lib/relaton/index.rb`) — Static API entry point. Delegates to Pool and Config. Main methods: `find_or_create`, `close`, `configure`.
36
+
37
+ - **Pool** — Object pool that caches Type instances by document type (`:ISO`, `:IEC`, `:IHO`, etc.). Reuses existing indexes if parameters match, recreates if they change.
38
+
39
+ - **Type** — Represents one index for a document type. Holds an array of `{id:, file:}` hashes. Provides `add_or_update`, `search` (string substring match or block), and `save`.
40
+
41
+ - **FileIO** — Handles reading/writing/downloading index files. Three modes based on `@url`: string URL (download and cache to `~/.relaton/{type}/`), `true` (read local file from `~/.relaton/{type}/`), `nil` (read from current directory). Uses class-level Mutex for thread-safe downloads. Validates index format on load.
42
+
43
+ - **FileStorage** — Storage abstraction module with `ctime`, `read`, `write`, `remove`. Can be replaced via `Config.storage=` for custom backends (e.g., S3).
44
+
45
+ - **Config** — Global configuration: `storage`, `storage_dir`, `filename` (default: "index.yaml").
46
+
47
+ ### Data Flow
48
+
49
+ 1. `Relaton::Index.find_or_create(:TYPE, url:, file:, id_keys:, pubid_class:)` → Pool looks up or creates Type
50
+ 2. Type lazily loads index via FileIO on first access
51
+ 3. FileIO either reads local YAML or downloads ZIP from URL, extracts, validates format
52
+ 4. Search matches against `:id` field (string comparison via `include?` or custom block)
53
+ 5. `save` writes index as YAML to local file
54
+
55
+ ### Index Format
56
+
57
+ YAML array of hashes with `:id` (string or structured hash) and `:file` (path string). Supports backward compatibility with old string-based format and newer pubid object format.
58
+
59
+ ### Key Design Decisions
60
+
61
+ - Remote indexes cached for 24 hours at `~/.relaton/{type}/index.yaml`
62
+ - Thread safety via `@@mutex` in FileIO prevents concurrent downloads of the same file
63
+ - Pubid deserialization is optional — when `pubid_class` is provided, string IDs are converted to structured objects
64
+ - Index format validation checks for required `:id` and `:file` keys, with automatic recovery (re-download or removal) on corruption
data/README.adoc CHANGED
@@ -24,7 +24,7 @@ If bundler is not being used to manage dependencies, install the gem by executin
24
24
 
25
25
  === Creating an index object
26
26
 
27
- The gem provides the `Relaton::Index.find_or_create {type}, url: {url}, file: {filename}, id_keys: {keys}` method to create an index object. The first argument is the type of dataset (ISO, IEC, IHO, etc.). The second argument is the URL to the zipped remote index file. The third argument is the filename of the local index file. The fourth argument is an array of ID's parts names. The URL, filename, and keys are optional.
27
+ The gem provides the `Relaton::Index.find_or_create {type}, url: {url}, file: {filename}, id_keys: {keys}, pubid_class: {class}` method to create an index object. The first argument is the type of dataset (ISO, IEC, IHO, etc.). The second argument is the URL to the zipped remote index file. The third argument is the filename of the local index file. The fourth argument is an array of ID's parts names. The fifth argument is a class that implements `Pubid::Core::Identifier` for deserializing ID hashes into structured identifier objects. The URL, filename, keys, and pubid_class are optional.
28
28
 
29
29
  If the URL is specified and the local file in a `/{home}/.relaton/{type}` dir doesn't exist or is outdated, the index file will be downloaded from the URL saved as a local file and an index object will be created from the file. If the file in the `/{home}/.relaton/{type}` exists and is actual, the index object will be created from the local file.
30
30
 
@@ -97,6 +97,30 @@ end
97
97
  # => [{ id: "B-4 2.19.0", file: "data/b-4_2_19_0.xml" }]
98
98
  ----
99
99
 
100
+ === Using pubid_class for structured identifiers
101
+
102
+ The `pubid_class` option allows index entries to be deserialized into structured identifier objects instead of plain strings or hashes. The class must include `Pubid::Core::Identifier` and provide a `.create(**hash)` factory method.
103
+
104
+ When `pubid_class` is specified, each `:id` hash in the index is converted into a pubid object, enabling structured access to identifier components (e.g., publisher, number, part, year).
105
+
106
+ [source,ruby]
107
+ ----
108
+ require 'relaton/index'
109
+ require 'pubid-iso'
110
+
111
+ # Create an index with pubid_class to deserialize IDs into Pubid::Iso::Identifier objects
112
+ index = Relaton::Index.find_or_create :ISO,
113
+ url: "https://raw.githubusercontent.com/relaton/relaton-data-iso/main/index-v2.zip",
114
+ pubid_class: Pubid::Iso::Identifier
115
+
116
+ # Search returns entries with structured pubid objects as IDs
117
+ results = index.search "ISO 1"
118
+ results.first[:id]
119
+ # => #<Pubid::Iso::Identifier: ISO 1>
120
+ results.first[:id].publisher
121
+ # => "ISO"
122
+ ----
123
+
100
124
  === Remove all index records
101
125
 
102
126
  This method removes all records from the index object. The index file is not removed.
@@ -7,6 +7,7 @@ module Relaton
7
7
  #
8
8
  class FileIO
9
9
  attr_reader :url, :pubid_class
10
+ attr_accessor :sorted
10
11
 
11
12
  @@file_locks = {}
12
13
  @@file_locks_mutex = Mutex.new
@@ -28,6 +29,7 @@ module Relaton
28
29
  @filename = filename
29
30
  @id_keys = id_keys || []
30
31
  @pubid_class = pubid_class
32
+ @sorted = false
31
33
  end
32
34
 
33
35
  #
@@ -117,7 +119,15 @@ module Relaton
117
119
  def deserialize_pubid(index)
118
120
  return index unless @pubid_class
119
121
 
120
- index.map { |r| { id: @pubid_class.create(**r[:id]), file: r[:file] } }
122
+ @sorted = true
123
+ prev_number = nil
124
+ index.map do |r|
125
+ id = @pubid_class.create(**r[:id])
126
+ num = get_id_number id
127
+ @sorted = false if prev_number && prev_number > num
128
+ prev_number = num
129
+ { id: id, file: r[:file] }
130
+ end
121
131
  end
122
132
 
123
133
  def warn_local_index_error(reason)
@@ -183,12 +193,24 @@ module Relaton
183
193
  # @return [void]
184
194
  #
185
195
  def save(index)
186
- yaml = index.map do |item|
196
+ yaml = sort_structured_index(index).map do |item|
187
197
  item.transform_values { |value| value.is_a?(Pubid::Core::Identifier::Base) ? value.to_h : value }
188
198
  end.to_yaml
189
199
  Index.config.storage.write file, yaml
190
200
  end
191
201
 
202
+ def sort_structured_index(index)
203
+ if @pubid_class && index.first&.dig(:id).is_a?(Pubid::Core::Identifier::Base)
204
+ index.sort_by { |item| get_id_number item[:id] }
205
+ else
206
+ index
207
+ end
208
+ end
209
+
210
+ def get_id_number(id)
211
+ id.respond_to?(:base) && id.base ? id.base.number.to_s : id.number.to_s
212
+ end
213
+
192
214
  #
193
215
  # Remove index file from storage
194
216
  #
@@ -10,9 +10,11 @@ module Relaton
10
10
  # @param [String, Symbol] type type of index (ISO, IEC, etc.)
11
11
  # @param [String, nil] url external URL to index, used to fetch index for searching files
12
12
  # @param [String, nil] file output file name
13
- # @param [Pubid::Core::Identifier::Base] pubid class for deserialization
13
+ # @param [Array<Symbol>] id_keys keys of identifier to be used for sorting index
14
+ # format of index file is checked if id_keys all is provided at least in one of the IDs
15
+ # @param [Pubid::Core::Identifier::Base, nil] pubid class for deserialization
14
16
  #
15
- def initialize(type, url = nil, file = nil, id_keys = nil, pubid_class = nil)
17
+ def initialize(type, url = nil, file = nil, id_keys = nil, pubid_class = nil) # rubocop:disable Metrics/ParameterLists
16
18
  @file = file
17
19
  filename = file || Index.config.filename
18
20
  @file_io = FileIO.new type.to_s.downcase, url, filename, id_keys, pubid_class
@@ -45,11 +47,15 @@ module Relaton
45
47
  # @return [void]
46
48
  #
47
49
  def add_or_update(id, file)
48
- item = index.find { |i| i[:id] == id }
50
+ key = id.to_s
51
+ item = id_lookup[key]
49
52
  if item
50
53
  item[:file] = file
51
54
  else
52
- index << { id: id, file: file }
55
+ new_item = { id: id, file: file }
56
+ index << new_item
57
+ id_lookup[key] = new_item
58
+ @file_io.sorted = false
53
59
  end
54
60
  end
55
61
 
@@ -60,18 +66,11 @@ module Relaton
60
66
  #
61
67
  # @return [Array<Hash>] search results
62
68
  #
63
- def search(id = nil)
64
- index.select do |i|
65
- if block_given?
66
- yield(i)
67
- else
68
- if i[:id].is_a?(String)
69
- id.is_a?(String) ? i[:id].include?(id) : i[:id].include?(id.to_s)
70
- else
71
- id.is_a?(String) ? i[:id].to_s.include?(id) : i[:id] == id
72
- end
73
- end
74
- end
69
+ def search(id = nil, &block)
70
+ items = search_candidates(id)
71
+ return items.select(&block) if block
72
+
73
+ items.select { |i| match_item(i, id) }
75
74
  end
76
75
 
77
76
  #
@@ -91,6 +90,7 @@ module Relaton
91
90
  def remove_file
92
91
  @file_io.remove
93
92
  @index = nil
93
+ @id_lookup = nil
94
94
  end
95
95
 
96
96
  #
@@ -100,6 +100,61 @@ module Relaton
100
100
  #
101
101
  def remove_all
102
102
  @index = []
103
+ @id_lookup = nil
104
+ @file_io.sorted = true
105
+ end
106
+
107
+ private
108
+
109
+ def id_lookup
110
+ @id_lookup ||= index.each_with_object({}) do |item, h|
111
+ h[item[:id].to_s] = item
112
+ end
113
+ end
114
+
115
+ def search_candidates(id)
116
+ # index needs to be created to check if sorted
117
+ idx = index
118
+ if @file_io.sorted && id && !id.is_a?(String)
119
+ candidates_by_number(id)
120
+ else
121
+ idx
122
+ end
123
+ end
124
+
125
+ def candidates_by_number(id)
126
+ target = get_id_number(id)
127
+ left = bsearch_left(target)
128
+ return [] unless left
129
+
130
+ right = bsearch_right(target)
131
+ index[left...right]
132
+ end
133
+
134
+ def get_id_number(id)
135
+ id.respond_to?(:base) && id.base ? id.base.number.to_s : id.number.to_s
136
+ end
137
+
138
+ def bsearch_left(target)
139
+ index.bsearch_index do |item|
140
+ get_id_number(item[:id]) >= target
141
+ end
142
+ end
143
+
144
+ def bsearch_right(target)
145
+ index.bsearch_index do |item|
146
+ get_id_number(item[:id]) > target
147
+ end || index.size
148
+ end
149
+
150
+ def match_item(item, id)
151
+ if item[:id].is_a?(String)
152
+ item[:id].include?(id.is_a?(String) ? id : id.to_s)
153
+ elsif id.is_a?(String)
154
+ item[:id].to_s.include?(id)
155
+ else
156
+ item[:id] == id
157
+ end
103
158
  end
104
159
  end
105
160
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Relaton
4
4
  module Index
5
- VERSION = "0.2.19"
5
+ VERSION = "0.2.20"
6
6
  end
7
7
  end
@@ -31,7 +31,7 @@ Gem::Specification.new do |spec|
31
31
  spec.require_paths = ["lib"]
32
32
 
33
33
  spec.add_dependency "openssl", "~> 3.3.2"
34
- spec.add_dependency "pubid-core", "~> 1.15.0"
34
+ spec.add_dependency "pubid-core", "~> 1.15.6"
35
35
  spec.add_dependency "relaton-logger", "~> 0.2.0"
36
36
  spec.add_dependency "rubyzip", "~> 2.3.0"
37
37
 
metadata CHANGED
@@ -1,13 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: relaton-index
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.19
4
+ version: 0.2.20
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
8
+ autorequire:
8
9
  bindir: exe
9
10
  cert_chain: []
10
- date: 1980-01-02 00:00:00.000000000 Z
11
+ date: 2026-02-09 00:00:00.000000000 Z
11
12
  dependencies:
12
13
  - !ruby/object:Gem::Dependency
13
14
  name: openssl
@@ -29,14 +30,14 @@ dependencies:
29
30
  requirements:
30
31
  - - "~>"
31
32
  - !ruby/object:Gem::Version
32
- version: 1.15.0
33
+ version: 1.15.6
33
34
  type: :runtime
34
35
  prerelease: false
35
36
  version_requirements: !ruby/object:Gem::Requirement
36
37
  requirements:
37
38
  - - "~>"
38
39
  - !ruby/object:Gem::Version
39
- version: 1.15.0
40
+ version: 1.15.6
40
41
  - !ruby/object:Gem::Dependency
41
42
  name: relaton-logger
42
43
  requirement: !ruby/object:Gem::Requirement
@@ -65,6 +66,7 @@ dependencies:
65
66
  - - "~>"
66
67
  - !ruby/object:Gem::Version
67
68
  version: 2.3.0
69
+ description:
68
70
  email:
69
71
  - open.source@ribose.com
70
72
  executables: []
@@ -73,6 +75,7 @@ extra_rdoc_files: []
73
75
  files:
74
76
  - ".rspec"
75
77
  - ".rubocop.yml"
78
+ - CLAUDE.md
76
79
  - Gemfile
77
80
  - LICENSE.txt
78
81
  - README.adoc
@@ -93,6 +96,7 @@ licenses:
93
96
  metadata:
94
97
  homepage_uri: https://github.com/relaton/relaton-index
95
98
  source_code_uri: https://github.com/relaton/relaton-index
99
+ post_install_message:
96
100
  rdoc_options: []
97
101
  require_paths:
98
102
  - lib
@@ -107,7 +111,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
107
111
  - !ruby/object:Gem::Version
108
112
  version: '0'
109
113
  requirements: []
110
- rubygems_version: 3.6.9
114
+ rubygems_version: 3.5.22
115
+ signing_key:
111
116
  specification_version: 4
112
117
  summary: Relaton Index is a library for indexing Relaton files.
113
118
  test_files: []