relaton-w3c 2.1.2 → 2.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CLAUDE.md +25 -7
- data/Gemfile +1 -0
- data/README.adoc +16 -0
- data/lib/relaton/w3c/data_fetcher.rb +112 -22
- data/lib/relaton/w3c/data_parser.rb +1 -1
- data/lib/relaton/w3c/safe_realize.rb +59 -0
- data/lib/relaton/w3c/version.rb +1 -1
- data/relaton-w3c.gemspec +1 -9
- metadata +5 -122
- data/grammars/basicdoc.rng +0 -2140
- data/grammars/biblio-standoc.rng +0 -268
- data/grammars/biblio.rng +0 -2125
- data/grammars/relaton-w3c-compile.rng +0 -11
- data/grammars/relaton-w3c.rng +0 -11
- data/lib/relaton/w3c/rate_limit_handler.rb +0 -62
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 7cdd6ed3f2403c63011b2f3d023afca6d1fde795b09e65d39e9044384ed52b2b
|
|
4
|
+
data.tar.gz: 4086a9ec931c36512084c11c7327be1cd5d21c85106bbe312e6c142dbe9639d4
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d2a193173054fd17d1b1e30218887722b58f3a12192923e7abb36e167314c6378e4255a87632bb9fac21035f7bc33e6b4f032ff8dd180fef938ef1977ab6cbd6
|
|
7
|
+
data.tar.gz: c3b52a5693c876cee96728332e694824350a50a2d4945c04c941e84d7c85d874216a02472f9121c3ca4446420cf686a6d8b8b042397deafcb4dcc89870e5a8b7
|
data/CLAUDE.md
CHANGED
|
@@ -47,9 +47,9 @@ All classes live under `lib/relaton/w3c/` in the `Relaton::W3c` namespace:
|
|
|
47
47
|
- **`Processor`** (`processor.rb`) — extends `Relaton::Core::Processor`, registers the W3C flavor (prefix `W3C`, dataset `w3c-api`)
|
|
48
48
|
|
|
49
49
|
**Data fetching:**
|
|
50
|
-
- **`DataFetcher`** (`data_fetcher.rb`) — extends `Core::DataFetcher`, fetches all W3C specs via the W3C API
|
|
50
|
+
- **`DataFetcher`** (`data_fetcher.rb`) — extends `Core::DataFetcher`, fetches all W3C specs via the W3C API. Fetches the specification index with `embed: true` so each spec is realized from the page's embedded payload instead of a per-spec HTTP request, and paginates by page number (only the `fetch` path repopulates `_embedded`, unlike realizing the `next` link). Runs `fetch_spec` across a small thread pool. A SIGINT (Ctrl-C) is handled gracefully — the producer stops queuing and workers stop after their in-flight spec, then the index of everything fetched so far is saved (the prior INT handler is restored afterwards, so the trap doesn't leak into the host process). See **Crawler tuning** for the env-var knobs.
|
|
51
51
|
- **`DataParser`** (`data_parser.rb`) — converts W3C API spec objects into `Relaton::W3c::Item` instances
|
|
52
|
-
- **`
|
|
52
|
+
- **`SafeRealize`** (`safe_realize.rb`) — mixin that, on a terminal error, skips the resource (returns `nil`) so one bad link doesn't abort the crawl (see Rate limiting & retries). It does not retry or cache successes — those live upstream.
|
|
53
53
|
- **`PubId`** (`pubid.rb`) — parses and compares W3C document identifiers (stage, code, date parts)
|
|
54
54
|
|
|
55
55
|
**Utilities:**
|
|
@@ -57,13 +57,31 @@ All classes live under `lib/relaton/w3c/` in the `Relaton::W3c` namespace:
|
|
|
57
57
|
|
|
58
58
|
The entry module is defined in `lib/relaton/w3c.rb` and exposes `grammar_hash`.
|
|
59
59
|
|
|
60
|
+
### Crawler tuning
|
|
61
|
+
|
|
62
|
+
`DataFetcher` is tunable via environment variables (read by class methods, so they apply to the whole crawl):
|
|
63
|
+
|
|
64
|
+
- **`RELATON_W3C_FETCH_CONCURRENCY`** (default `8`) — number of `fetch_spec` worker threads. Lower it to lighten load on api.w3.org or for debugging.
|
|
65
|
+
- **`RELATON_W3C_FETCH_VERSIONS`** (default enabled) — set to `false`/`0`/`no`/`off` for a faster, shallower crawl that emits only the top-level specifications and skips each spec's version-history fan-out (version_history, predecessor/successor versions — the bulk of the API requests). Leave it set (the default) for a complete dataset.
|
|
66
|
+
|
|
67
|
+
`embed: true` (always on) inlines each specification into its index page, so the per-spec realize is served from memory rather than an HTTP request — the largest single reduction in request count.
|
|
68
|
+
|
|
69
|
+
### Rate limiting & retries
|
|
70
|
+
|
|
71
|
+
Transient-failure resilience is layered upstream, not in this gem:
|
|
72
|
+
- **w3c_api** builds its HAL client with `faraday-retry` to retry HTTP 403 (the W3C rate-limit signal) and connection/timeout errors.
|
|
73
|
+
- **lutaml-hal** (beneath w3c_api) retries 429 and 5xx with exponential backoff.
|
|
74
|
+
|
|
75
|
+
Successful objects are cached by **w3c_api** (lutaml-hal caches realized objects keyed by URL, thread-safely as of lutaml-hal 0.2.1), so `SafeRealize` doesn't cache them. It only **retries nothing** and remembers hrefs that failed terminally (in a `Concurrent::Map`), returning `nil` for them so one bad link doesn't abort the crawl and isn't re-fetched on every reference. Network errors are not remembered, so a later reference can try again.
|
|
76
|
+
|
|
60
77
|
### Key Dependencies
|
|
61
78
|
|
|
62
|
-
- **relaton-bib** (~> 2.
|
|
79
|
+
- **relaton-bib** (~> 2.1.0) — provides base `Bib::Item`, `Bib::Ext`, `Bib::Doctype` and serialization mixins (LutaML model layer)
|
|
63
80
|
- **relaton-core** — provides base `Core::Processor` and `Core::DataFetcher`
|
|
64
|
-
- **relaton-index** — index-based search for bibliographic references
|
|
65
|
-
- **w3c_api** — W3C API client used by `DataFetcher` to retrieve specifications
|
|
66
|
-
|
|
81
|
+
- **relaton-index** — index-based search for bibliographic references; also unpacks the index zip at runtime
|
|
82
|
+
- **w3c_api** (~> 0.3.2) — W3C API (HAL/REST) client used by `DataFetcher` to retrieve specifications; owns rate-limit and transient-error retries, and the (thread-safe) object cache
|
|
83
|
+
|
|
84
|
+
The W3C data is fetched entirely through `w3c_api`; the older RDF/SPARQL/scraping stack (linkeddata, rdf, sparql, shex, mechanize, …) has been removed.
|
|
67
85
|
|
|
68
86
|
### Schema Validation
|
|
69
87
|
|
|
@@ -82,7 +100,7 @@ Tests use RSpec with:
|
|
|
82
100
|
- **VCR** — recorded HTTP cassettes in `spec/vcr_cassettes/` (7-day re-record interval)
|
|
83
101
|
- **WebMock** — disables external HTTP in tests
|
|
84
102
|
|
|
85
|
-
Test fixtures live in `spec/fixtures/` (YAML
|
|
103
|
+
Test fixtures live in `spec/fixtures/` (YAML and XML files).
|
|
86
104
|
|
|
87
105
|
## Style
|
|
88
106
|
|
data/Gemfile
CHANGED
data/README.adoc
CHANGED
|
@@ -118,6 +118,22 @@ require 'relaton/w3c/data_fetcher'
|
|
|
118
118
|
Relaton::W3c::DataFetcher.fetch
|
|
119
119
|
----
|
|
120
120
|
|
|
121
|
+
The crawl is tunable via environment variables:
|
|
122
|
+
|
|
123
|
+
- `RELATON_W3C_FETCH_CONCURRENCY` (default `8`) - number of parallel worker threads. Lower it to lighten load on `api.w3.org`.
|
|
124
|
+
- `RELATON_W3C_FETCH_VERSIONS` (default enabled) - set to `false` for a faster, shallower crawl that fetches only the top-level specifications and skips each spec's version history (the bulk of the API requests). Leave it unset for a complete dataset.
|
|
125
|
+
|
|
126
|
+
The fetcher requests the specifications index with embedded specification data, so each specification is read from the page already in memory instead of issuing a separate HTTP request.
|
|
127
|
+
|
|
128
|
+
A full crawl is long-running, so it handles `Ctrl-C` gracefully: it stops fetching, lets in-flight work finish, and saves the index of everything collected so far rather than losing the run.
|
|
129
|
+
|
|
130
|
+
[source,sh]
|
|
131
|
+
----
|
|
132
|
+
# Fast, shallow refresh: top-level specs only, 4 workers
|
|
133
|
+
RELATON_W3C_FETCH_VERSIONS=false RELATON_W3C_FETCH_CONCURRENCY=4 \
|
|
134
|
+
ruby -r relaton/w3c/data_fetcher -e 'Relaton::W3c::DataFetcher.fetch'
|
|
135
|
+
----
|
|
136
|
+
|
|
121
137
|
=== Logging
|
|
122
138
|
|
|
123
139
|
RelatonW3c uses the relaton-logger gem for logging. By default, it logs to STDOUT. To change the log levels and add other loggers, read the https://github.com/relaton/relaton-logger#usage[relaton-logger] documentation.
|
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
require "relaton/core"
|
|
2
2
|
require "w3c_api"
|
|
3
3
|
require_relative "../w3c"
|
|
4
|
-
require_relative "
|
|
4
|
+
require_relative "safe_realize"
|
|
5
5
|
require_relative "data_parser"
|
|
6
6
|
require_relative "pubid"
|
|
7
7
|
|
|
8
8
|
module Relaton
|
|
9
9
|
module W3c
|
|
10
10
|
class DataFetcher < Core::DataFetcher
|
|
11
|
-
include Relaton::W3c::
|
|
11
|
+
include Relaton::W3c::SafeRealize
|
|
12
12
|
|
|
13
13
|
DEFAULT_CONCURRENCY = 8
|
|
14
14
|
|
|
@@ -19,9 +19,22 @@ module Relaton
|
|
|
19
19
|
(ENV["RELATON_W3C_FETCH_CONCURRENCY"] || DEFAULT_CONCURRENCY).to_i
|
|
20
20
|
end
|
|
21
21
|
|
|
22
|
+
# Whether to crawl each specification's version history (version_history,
|
|
23
|
+
# predecessor_versions, successor_versions). Enabled by default for a
|
|
24
|
+
# complete dataset. Set RELATON_W3C_FETCH_VERSIONS=false for a faster,
|
|
25
|
+
# shallower crawl that emits only the top-level specifications and skips
|
|
26
|
+
# the per-spec version fan-out (the bulk of the API requests).
|
|
27
|
+
def self.fetch_versions?
|
|
28
|
+
val = ENV["RELATON_W3C_FETCH_VERSIONS"]
|
|
29
|
+
return true if val.nil? || val.empty?
|
|
30
|
+
|
|
31
|
+
!%w[0 false no off].include?(val.strip.downcase)
|
|
32
|
+
end
|
|
33
|
+
|
|
22
34
|
def initialize(*args)
|
|
23
35
|
super
|
|
24
36
|
@mutex = Mutex.new
|
|
37
|
+
@interrupted = false
|
|
25
38
|
end
|
|
26
39
|
|
|
27
40
|
def index
|
|
@@ -39,36 +52,83 @@ module Relaton
|
|
|
39
52
|
#
|
|
40
53
|
# Parse documents in parallel. The crawler is heavily I/O-bound on
|
|
41
54
|
# api.w3.org round-trips (~30-50k requests per run), so a small thread
|
|
42
|
-
# pool gives a near-linear speedup. Pagination still happens serially
|
|
43
|
-
#
|
|
55
|
+
# pool gives a near-linear speedup. Pagination still happens serially:
|
|
56
|
+
# each page's `next?` flag gates whether the next page is requested.
|
|
57
|
+
#
|
|
58
|
+
# A SIGINT (Ctrl-C) is handled gracefully: the producer stops queuing and
|
|
59
|
+
# the workers stop processing after their in-flight spec, then the index
|
|
60
|
+
# of everything fetched so far is saved rather than the run being lost.
|
|
44
61
|
#
|
|
45
62
|
def fetch(_source = nil)
|
|
46
63
|
n_workers = self.class.concurrency
|
|
47
64
|
queue = SizedQueue.new(n_workers * 4)
|
|
48
65
|
workers = Array.new(n_workers) { spawn_worker(queue) }
|
|
49
66
|
|
|
50
|
-
|
|
67
|
+
with_interrupt_handler do
|
|
68
|
+
enqueue_specs(queue)
|
|
69
|
+
n_workers.times { queue << nil } # poison pills
|
|
70
|
+
workers.each(&:join)
|
|
71
|
+
Util.warn "Crawl interrupted — saving progress collected so far." if @interrupted
|
|
72
|
+
index.save
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
report_errors
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
#
|
|
79
|
+
# Page through the specifications index, feeding each spec (paired with
|
|
80
|
+
# its embedded page) to the worker queue. Returns early when interrupted.
|
|
81
|
+
#
|
|
82
|
+
# embed: true inlines each specification's full payload into the index
|
|
83
|
+
# page's `_embedded` block, so a spec link realizes from that page in
|
|
84
|
+
# memory instead of making its own HTTP request — one request per page
|
|
85
|
+
# rather than one per specification. The page is queued alongside each
|
|
86
|
+
# link so the worker can hand it back to realize as the parent_resource.
|
|
87
|
+
#
|
|
88
|
+
def enqueue_specs(queue)
|
|
89
|
+
specs = client.specifications(embed: true)
|
|
51
90
|
loop do
|
|
52
|
-
|
|
53
|
-
|
|
91
|
+
page = specs
|
|
92
|
+
page.links.specifications.each do |spec|
|
|
93
|
+
break if @interrupted
|
|
54
94
|
|
|
55
|
-
|
|
56
|
-
|
|
95
|
+
queue << [spec, page]
|
|
96
|
+
end
|
|
97
|
+
break if @interrupted || !page.next?
|
|
57
98
|
|
|
58
|
-
|
|
59
|
-
|
|
99
|
+
# Fetch the next page through the client's fetch path rather than
|
|
100
|
+
# realizing the `next` link: only fetch populates the page's
|
|
101
|
+
# embedded_data, so this keeps embed working past page 1. Realizing
|
|
102
|
+
# the `next` link drops `_embedded` and forces a per-spec HTTP
|
|
103
|
+
# request for every specification on every later page.
|
|
104
|
+
next_page = fetch_specifications_page(page.page + 1)
|
|
105
|
+
break unless next_page
|
|
60
106
|
|
|
61
|
-
|
|
62
|
-
|
|
107
|
+
specs = next_page
|
|
108
|
+
end
|
|
63
109
|
end
|
|
64
110
|
|
|
65
|
-
def fetch_spec(unrealized_spec)
|
|
66
|
-
|
|
111
|
+
def fetch_spec(unrealized_spec, page = nil)
|
|
112
|
+
# When `page` came from an embed:true fetch, realizing against it as the
|
|
113
|
+
# parent_resource serves the spec from embedded data (no HTTP request).
|
|
114
|
+
spec = realize(unrealized_spec, parent_resource: page)
|
|
67
115
|
return unless spec
|
|
68
116
|
|
|
69
117
|
local_errors = Hash.new(true)
|
|
70
118
|
save_doc DataParser.parse(spec, local_errors)
|
|
71
119
|
|
|
120
|
+
fetch_versions(spec) if self.class.fetch_versions?
|
|
121
|
+
|
|
122
|
+
@mutex.synchronize { local_errors.each { |k, v| @errors[k] &&= v } }
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
#
|
|
126
|
+
# Crawl a specification's version history: its dated editions plus the
|
|
127
|
+
# predecessor/successor version chains. Each entry is a separate HTTP
|
|
128
|
+
# request, so this is the bulk of a run and can be skipped via
|
|
129
|
+
# RELATON_W3C_FETCH_VERSIONS=false (see .fetch_versions?).
|
|
130
|
+
#
|
|
131
|
+
def fetch_versions(spec)
|
|
72
132
|
if spec.links.respond_to?(:version_history) && spec.links.version_history
|
|
73
133
|
version_history = realize spec.links.version_history
|
|
74
134
|
version_history&.links&.spec_versions&.each { |version| parse_and_save version }
|
|
@@ -79,12 +139,10 @@ module Relaton
|
|
|
79
139
|
predecessor_versions&.links&.predecessor_versions&.each { |version| parse_and_save version }
|
|
80
140
|
end
|
|
81
141
|
|
|
82
|
-
|
|
83
|
-
successor_versions = realize spec.links.successor_versions
|
|
84
|
-
successor_versions&.links&.successor_versions&.each { |version| parse_and_save version }
|
|
85
|
-
end
|
|
142
|
+
return unless spec.links.respond_to?(:successor_versions) && spec.links.successor_versions
|
|
86
143
|
|
|
87
|
-
|
|
144
|
+
successor_versions = realize spec.links.successor_versions
|
|
145
|
+
successor_versions&.links&.successor_versions&.each { |version| parse_and_save version }
|
|
88
146
|
end
|
|
89
147
|
|
|
90
148
|
#
|
|
@@ -134,11 +192,43 @@ module Relaton
|
|
|
134
192
|
|
|
135
193
|
private
|
|
136
194
|
|
|
195
|
+
# Install a SIGINT handler for the duration of the crawl so Ctrl-C sets
|
|
196
|
+
# the @interrupted flag (observed by the producer loop and the workers)
|
|
197
|
+
# instead of killing the process mid-write. The trap body is kept minimal
|
|
198
|
+
# (no I/O or locking) because trap context is restricted; the user-facing
|
|
199
|
+
# notice is printed from the main thread once the crawl winds down. The
|
|
200
|
+
# previous handler is restored on the way out so the trap doesn't leak
|
|
201
|
+
# into the host process.
|
|
202
|
+
def with_interrupt_handler
|
|
203
|
+
previous = Signal.trap("INT") { @interrupted = true }
|
|
204
|
+
yield
|
|
205
|
+
ensure
|
|
206
|
+
Signal.trap("INT", previous || "DEFAULT")
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
# Fetch one page of the specifications index with embed enabled. Goes
|
|
210
|
+
# through the client (the register's fetch path) so the page's
|
|
211
|
+
# embedded_data is populated. Transient 403/5xx/connection failures are
|
|
212
|
+
# already retried upstream (w3c_api/lutaml-hal); a terminal error here
|
|
213
|
+
# stops pagination gracefully rather than crashing the crawl.
|
|
214
|
+
def fetch_specifications_page(number)
|
|
215
|
+
client.specifications(embed: true, page: number)
|
|
216
|
+
rescue Lutaml::Hal::Error, Faraday::Error => e
|
|
217
|
+
log_error "Failed to fetch specifications page #{number}: " \
|
|
218
|
+
"#{e.class}: #{e.message}"
|
|
219
|
+
nil
|
|
220
|
+
end
|
|
221
|
+
|
|
137
222
|
def spawn_worker(queue)
|
|
138
223
|
Thread.new do
|
|
139
|
-
while (
|
|
224
|
+
while (item = queue.pop)
|
|
225
|
+
# Once interrupted, drain the queue without processing so the
|
|
226
|
+
# producer unblocks and the pool reaches its poison pills quickly.
|
|
227
|
+
next if @interrupted
|
|
228
|
+
|
|
229
|
+
spec, page = item
|
|
140
230
|
begin
|
|
141
|
-
fetch_spec spec
|
|
231
|
+
fetch_spec spec, page
|
|
142
232
|
rescue StandardError => e
|
|
143
233
|
log_error "fetch_spec failed: #{e.class}: #{e.message}\n" \
|
|
144
234
|
"#{e.backtrace.first(5).join("\n")}"
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
require "concurrent/map"
|
|
2
|
+
|
|
3
|
+
module Relaton
|
|
4
|
+
module W3c
|
|
5
|
+
# Thin wrapper over lutaml-hal's `realize`. Successful objects are cached by
|
|
6
|
+
# w3c_api (it caches realized objects keyed by URL), so this only remembers
|
|
7
|
+
# resources that failed terminally and returns nil for them — so one broken
|
|
8
|
+
# link doesn't abort the crawl and isn't re-fetched on every reference.
|
|
9
|
+
#
|
|
10
|
+
# Transient failures are retried upstream: w3c_api retries HTTP 403 (the
|
|
11
|
+
# W3C rate-limit signal) and connection/timeout errors, and lutaml-hal
|
|
12
|
+
# retries 429 and 5xx. By the time an error surfaces here it is terminal.
|
|
13
|
+
module SafeRealize
|
|
14
|
+
# Hrefs that failed terminally — one map shared by every includer
|
|
15
|
+
# (DataFetcher and DataParser) since a broken resource is broken for the
|
|
16
|
+
# whole crawl. Initialized eagerly (at load, single-threaded) so the
|
|
17
|
+
# parallel fetcher's first concurrent access can't race a lazy `||=`;
|
|
18
|
+
# Concurrent::Map then handles the concurrent reads/writes.
|
|
19
|
+
@skipped = Concurrent::Map.new
|
|
20
|
+
|
|
21
|
+
def self.skipped
|
|
22
|
+
@skipped
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
# @param parent_resource [Object, nil] the index/page the link came from.
|
|
26
|
+
# When the page was fetched with `embed: true`, its inlined `_embedded`
|
|
27
|
+
# payload lets the link realize from memory instead of issuing an HTTP
|
|
28
|
+
# request. nil (the default) preserves the plain remote-fetch behavior.
|
|
29
|
+
def realize(obj, parent_resource: nil)
|
|
30
|
+
href = resolve_href(obj)
|
|
31
|
+
return nil if SafeRealize.skipped.key?(href)
|
|
32
|
+
|
|
33
|
+
obj.realize(parent_resource: parent_resource)
|
|
34
|
+
rescue Lutaml::Hal::ConnectionError, Lutaml::Hal::TimeoutError, Faraday::Error, Net::OpenTimeout => e
|
|
35
|
+
# Network-level failure (already retried by w3c_api). The resource itself
|
|
36
|
+
# is fine, so don't skip it permanently — a later reference can try again.
|
|
37
|
+
Util.warn "Failed to realize object: #{href}, error: #{e.message}"
|
|
38
|
+
nil
|
|
39
|
+
rescue Lutaml::Hal::NotFoundError
|
|
40
|
+
Util.warn "Object not found: #{href}"
|
|
41
|
+
SafeRealize.skipped[href] = true
|
|
42
|
+
nil
|
|
43
|
+
rescue Lutaml::Hal::Error => e
|
|
44
|
+
# Definitive upstream error (403 rate-limit, 5xx, 429) already retried by
|
|
45
|
+
# w3c_api / lutaml-hal. Skip the broken/unavailable resource rather than
|
|
46
|
+
# re-hitting it for every link that references it.
|
|
47
|
+
Util.warn "Skipping #{href}, upstream error after retries: #{e.message}"
|
|
48
|
+
SafeRealize.skipped[href] = true
|
|
49
|
+
nil
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
private
|
|
53
|
+
|
|
54
|
+
def resolve_href(obj)
|
|
55
|
+
obj.href || obj.links.self.href
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
end
|
data/lib/relaton/w3c/version.rb
CHANGED
data/relaton-w3c.gemspec
CHANGED
|
@@ -31,16 +31,8 @@ Gem::Specification.new do |spec|
|
|
|
31
31
|
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
|
32
32
|
spec.require_paths = ["lib"]
|
|
33
33
|
|
|
34
|
-
spec.add_dependency "linkeddata", "~> 3.2"
|
|
35
|
-
spec.add_dependency "mechanize", "~> 2.10"
|
|
36
|
-
spec.add_dependency "rdf", "~> 3.2"
|
|
37
|
-
spec.add_dependency "rdf-normalize", "~> 0.6"
|
|
38
34
|
spec.add_dependency "relaton-bib", "~> 2.1.0"
|
|
39
35
|
spec.add_dependency "relaton-core", "~> 0.0.13"
|
|
40
36
|
spec.add_dependency "relaton-index", "~> 0.2.8"
|
|
41
|
-
spec.add_dependency "
|
|
42
|
-
spec.add_dependency "shex", "~> 0.7"
|
|
43
|
-
spec.add_dependency "csv", "~> 3.0"
|
|
44
|
-
spec.add_dependency "sparql", "~> 3.2"
|
|
45
|
-
spec.add_dependency "w3c_api", "~> 0.1.3"
|
|
37
|
+
spec.add_dependency "w3c_api", "~> 0.3.2"
|
|
46
38
|
end
|
metadata
CHANGED
|
@@ -1,71 +1,15 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: relaton-w3c
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 2.1.
|
|
4
|
+
version: 2.1.4
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ribose Inc.
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2026-
|
|
11
|
+
date: 2026-06-04 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
|
-
- !ruby/object:Gem::Dependency
|
|
14
|
-
name: linkeddata
|
|
15
|
-
requirement: !ruby/object:Gem::Requirement
|
|
16
|
-
requirements:
|
|
17
|
-
- - "~>"
|
|
18
|
-
- !ruby/object:Gem::Version
|
|
19
|
-
version: '3.2'
|
|
20
|
-
type: :runtime
|
|
21
|
-
prerelease: false
|
|
22
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
23
|
-
requirements:
|
|
24
|
-
- - "~>"
|
|
25
|
-
- !ruby/object:Gem::Version
|
|
26
|
-
version: '3.2'
|
|
27
|
-
- !ruby/object:Gem::Dependency
|
|
28
|
-
name: mechanize
|
|
29
|
-
requirement: !ruby/object:Gem::Requirement
|
|
30
|
-
requirements:
|
|
31
|
-
- - "~>"
|
|
32
|
-
- !ruby/object:Gem::Version
|
|
33
|
-
version: '2.10'
|
|
34
|
-
type: :runtime
|
|
35
|
-
prerelease: false
|
|
36
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
37
|
-
requirements:
|
|
38
|
-
- - "~>"
|
|
39
|
-
- !ruby/object:Gem::Version
|
|
40
|
-
version: '2.10'
|
|
41
|
-
- !ruby/object:Gem::Dependency
|
|
42
|
-
name: rdf
|
|
43
|
-
requirement: !ruby/object:Gem::Requirement
|
|
44
|
-
requirements:
|
|
45
|
-
- - "~>"
|
|
46
|
-
- !ruby/object:Gem::Version
|
|
47
|
-
version: '3.2'
|
|
48
|
-
type: :runtime
|
|
49
|
-
prerelease: false
|
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
51
|
-
requirements:
|
|
52
|
-
- - "~>"
|
|
53
|
-
- !ruby/object:Gem::Version
|
|
54
|
-
version: '3.2'
|
|
55
|
-
- !ruby/object:Gem::Dependency
|
|
56
|
-
name: rdf-normalize
|
|
57
|
-
requirement: !ruby/object:Gem::Requirement
|
|
58
|
-
requirements:
|
|
59
|
-
- - "~>"
|
|
60
|
-
- !ruby/object:Gem::Version
|
|
61
|
-
version: '0.6'
|
|
62
|
-
type: :runtime
|
|
63
|
-
prerelease: false
|
|
64
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
65
|
-
requirements:
|
|
66
|
-
- - "~>"
|
|
67
|
-
- !ruby/object:Gem::Version
|
|
68
|
-
version: '0.6'
|
|
69
13
|
- !ruby/object:Gem::Dependency
|
|
70
14
|
name: relaton-bib
|
|
71
15
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -108,76 +52,20 @@ dependencies:
|
|
|
108
52
|
- - "~>"
|
|
109
53
|
- !ruby/object:Gem::Version
|
|
110
54
|
version: 0.2.8
|
|
111
|
-
- !ruby/object:Gem::Dependency
|
|
112
|
-
name: rubyzip
|
|
113
|
-
requirement: !ruby/object:Gem::Requirement
|
|
114
|
-
requirements:
|
|
115
|
-
- - "~>"
|
|
116
|
-
- !ruby/object:Gem::Version
|
|
117
|
-
version: '2.3'
|
|
118
|
-
type: :runtime
|
|
119
|
-
prerelease: false
|
|
120
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
121
|
-
requirements:
|
|
122
|
-
- - "~>"
|
|
123
|
-
- !ruby/object:Gem::Version
|
|
124
|
-
version: '2.3'
|
|
125
|
-
- !ruby/object:Gem::Dependency
|
|
126
|
-
name: shex
|
|
127
|
-
requirement: !ruby/object:Gem::Requirement
|
|
128
|
-
requirements:
|
|
129
|
-
- - "~>"
|
|
130
|
-
- !ruby/object:Gem::Version
|
|
131
|
-
version: '0.7'
|
|
132
|
-
type: :runtime
|
|
133
|
-
prerelease: false
|
|
134
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
135
|
-
requirements:
|
|
136
|
-
- - "~>"
|
|
137
|
-
- !ruby/object:Gem::Version
|
|
138
|
-
version: '0.7'
|
|
139
|
-
- !ruby/object:Gem::Dependency
|
|
140
|
-
name: csv
|
|
141
|
-
requirement: !ruby/object:Gem::Requirement
|
|
142
|
-
requirements:
|
|
143
|
-
- - "~>"
|
|
144
|
-
- !ruby/object:Gem::Version
|
|
145
|
-
version: '3.0'
|
|
146
|
-
type: :runtime
|
|
147
|
-
prerelease: false
|
|
148
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
149
|
-
requirements:
|
|
150
|
-
- - "~>"
|
|
151
|
-
- !ruby/object:Gem::Version
|
|
152
|
-
version: '3.0'
|
|
153
|
-
- !ruby/object:Gem::Dependency
|
|
154
|
-
name: sparql
|
|
155
|
-
requirement: !ruby/object:Gem::Requirement
|
|
156
|
-
requirements:
|
|
157
|
-
- - "~>"
|
|
158
|
-
- !ruby/object:Gem::Version
|
|
159
|
-
version: '3.2'
|
|
160
|
-
type: :runtime
|
|
161
|
-
prerelease: false
|
|
162
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
163
|
-
requirements:
|
|
164
|
-
- - "~>"
|
|
165
|
-
- !ruby/object:Gem::Version
|
|
166
|
-
version: '3.2'
|
|
167
55
|
- !ruby/object:Gem::Dependency
|
|
168
56
|
name: w3c_api
|
|
169
57
|
requirement: !ruby/object:Gem::Requirement
|
|
170
58
|
requirements:
|
|
171
59
|
- - "~>"
|
|
172
60
|
- !ruby/object:Gem::Version
|
|
173
|
-
version: 0.
|
|
61
|
+
version: 0.3.2
|
|
174
62
|
type: :runtime
|
|
175
63
|
prerelease: false
|
|
176
64
|
version_requirements: !ruby/object:Gem::Requirement
|
|
177
65
|
requirements:
|
|
178
66
|
- - "~>"
|
|
179
67
|
- !ruby/object:Gem::Version
|
|
180
|
-
version: 0.
|
|
68
|
+
version: 0.3.2
|
|
181
69
|
description: 'Relaton::W3c: retrieve W3C Standards for bibliographic using the IsoBibliographicItem
|
|
182
70
|
model'
|
|
183
71
|
email:
|
|
@@ -200,11 +88,6 @@ files:
|
|
|
200
88
|
- bin/console
|
|
201
89
|
- bin/rspec
|
|
202
90
|
- bin/setup
|
|
203
|
-
- grammars/basicdoc.rng
|
|
204
|
-
- grammars/biblio-standoc.rng
|
|
205
|
-
- grammars/biblio.rng
|
|
206
|
-
- grammars/relaton-w3c-compile.rng
|
|
207
|
-
- grammars/relaton-w3c.rng
|
|
208
91
|
- lib/relaton/w3c.rb
|
|
209
92
|
- lib/relaton/w3c/bibdata.rb
|
|
210
93
|
- lib/relaton/w3c/bibitem.rb
|
|
@@ -217,7 +100,7 @@ files:
|
|
|
217
100
|
- lib/relaton/w3c/item_data.rb
|
|
218
101
|
- lib/relaton/w3c/processor.rb
|
|
219
102
|
- lib/relaton/w3c/pubid.rb
|
|
220
|
-
- lib/relaton/w3c/
|
|
103
|
+
- lib/relaton/w3c/safe_realize.rb
|
|
221
104
|
- lib/relaton/w3c/util.rb
|
|
222
105
|
- lib/relaton/w3c/version.rb
|
|
223
106
|
- relaton-w3c.gemspec
|