relaton-jis 1.20.0 → 2.0.0.pre.alpha.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -1
- data/CLAUDE.md +57 -0
- data/README.adoc +64 -69
- data/grammars/basicdoc.rng +1559 -671
- data/grammars/biblio-standoc.rng +107 -46
- data/grammars/biblio.rng +1010 -375
- data/grammars/relaton-jis.rng +5 -54
- data/lib/relaton/jis/bibdata.rb +7 -0
- data/lib/relaton/jis/bibitem.rb +7 -0
- data/lib/relaton/jis/bibliography.rb +70 -0
- data/lib/relaton/jis/data_fetcher.rb +161 -0
- data/lib/relaton/jis/docidentifier.rb +21 -0
- data/lib/relaton/jis/doctype.rb +9 -0
- data/lib/relaton/jis/ext.rb +16 -0
- data/lib/relaton/jis/hit.rb +76 -0
- data/lib/relaton/jis/hit_collection.rb +111 -0
- data/lib/relaton/jis/item.rb +26 -0
- data/lib/relaton/jis/item_base.rb +11 -0
- data/lib/relaton/jis/item_data.rb +8 -0
- data/lib/relaton/jis/processor.rb +55 -0
- data/lib/relaton/jis/relation.rb +11 -0
- data/lib/relaton/jis/scraper.rb +203 -0
- data/lib/relaton/jis/structured_identifier.rb +13 -0
- data/lib/relaton/jis/util.rb +8 -0
- data/lib/relaton/jis/version.rb +7 -0
- data/lib/relaton/jis.rb +30 -0
- data/relaton_jis.gemspec +6 -6
- metadata +29 -25
- data/lib/relaton_jis/bibliographic_item.rb +0 -12
- data/lib/relaton_jis/bibliography.rb +0 -65
- data/lib/relaton_jis/data_fetcher.rb +0 -158
- data/lib/relaton_jis/document_type.rb +0 -16
- data/lib/relaton_jis/hash_converter.rb +0 -16
- data/lib/relaton_jis/hit.rb +0 -57
- data/lib/relaton_jis/hit_collection.rb +0 -100
- data/lib/relaton_jis/processor.rb +0 -61
- data/lib/relaton_jis/scraper.rb +0 -157
- data/lib/relaton_jis/util.rb +0 -6
- data/lib/relaton_jis/version.rb +0 -5
- data/lib/relaton_jis/xml_parser.rb +0 -17
- data/lib/relaton_jis.rb +0 -29
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: eb4b54d49f314da1369dec1e4446b108ddd68b92555c88c8951233fd7e6b31dc
|
|
4
|
+
data.tar.gz: 2660c24bf8a0916a28010eec9b434953233076bb69775c9852473780f2317373
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: aa98951e0c115564f41f81a2de3968617a2699559f4e8a38cc0eaebd6bfeb2feeee16222d01875db30dbcde676a2220671ccdd2709ab4043559874a11470ac9b
|
|
7
|
+
data.tar.gz: 19c4b2f8e43fb87d8f58ff759d3af106b8b8ca3f5730dbffcfd8a089b1f2d40690a22862df507401313a3e1bd4c3c46941e2aa37ff32005b398268a4f976d5ed
|
data/.rubocop.yml
CHANGED
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
relaton-jis is a Ruby gem that retrieves Japanese Industrial Standards (JIS) metadata from webdesk.jsa.or.jp and models them as Relaton bibliographic items with XML/YAML serialization and RelaxNG schema validation.
|
|
8
|
+
|
|
9
|
+
## Commands
|
|
10
|
+
|
|
11
|
+
- `bundle exec rake spec` — run all tests
|
|
12
|
+
- `bundle exec rspec spec/relaton/jis/item_spec.rb` — run a single test file
|
|
13
|
+
- `bundle exec rspec spec/relaton/jis/item_spec.rb:15` — run a single test by line number
|
|
14
|
+
- `bundle exec rake rubocop` — lint
|
|
15
|
+
- `bundle exec rake` — default task (runs specs)
|
|
16
|
+
|
|
17
|
+
## Architecture
|
|
18
|
+
|
|
19
|
+
The gem is currently on the `lutaml-integration` branch, a major refactoring from the old `RelatonJis::` namespace to `Relaton::Jis::` using lutaml-model for data modeling.
|
|
20
|
+
|
|
21
|
+
### Class Hierarchy
|
|
22
|
+
|
|
23
|
+
All classes live under `lib/relaton/jis/`:
|
|
24
|
+
|
|
25
|
+
- **`Item`** (extends `Iso::Item`) — core bibliographic item; defines the `Ext` attribute for JIS extensions; uses model `Bib::ItemData`
|
|
26
|
+
- **`Bibitem`** (extends `Item`, includes `Bib::BibitemShared`) — XML bibitem serialization
|
|
27
|
+
- **`Bibdata`** (extends `Item`, includes `Bib::BibdataShared`) — XML bibdata wrapper
|
|
28
|
+
- **`Ext`** (extends `Iso::Ext`) — JIS-specific extension data (schema_version, doctype)
|
|
29
|
+
- **`Doctype`** (extends `Bib::Doctype`) — allowed types: `japanese-industrial-standard`, `technical-report`, `technical-specification`, `amendment`
|
|
30
|
+
- **`Scraper`** — scrapes individual JIS document pages from webdesk.jsa.or.jp; returns `Bib::ItemData`; editorial group is modeled as a contributor with `Bib::Subdivision` (not `EditorialGroup`)
|
|
31
|
+
- **`DataFetcher`** (extends `Core::DataFetcher`) — bulk-fetches all JIS documents via threaded scraping; implements `to_yaml`/`to_xml`/`to_bibxml` for serialization dispatch; loaded on-demand via `require "relaton/jis/data_fetcher"`
|
|
32
|
+
- **`Processor`** (extends `Core::Processor`) — Relaton processor registration; provides `get`, `fetch_data`, `from_xml`, `from_yaml`, `grammar_hash`; prefix `JIS`, defaultprefix `^(JIS|TR)\s`
|
|
33
|
+
- **`Util`** (includes `Relaton::Bib::Util`) — logging with PROGNAME "relaton-jis"
|
|
34
|
+
|
|
35
|
+
### Key Dependencies
|
|
36
|
+
|
|
37
|
+
- **relaton-iso / relaton-bib** — parent bibliographic item models this gem extends
|
|
38
|
+
- **lutaml-model** — DSL for `attribute` definitions and serialization
|
|
39
|
+
- **mechanize** — web scraping from JSA website
|
|
40
|
+
|
|
41
|
+
### Serialization & Validation
|
|
42
|
+
|
|
43
|
+
- YAML and XML round-trip serialization via lutaml-model
|
|
44
|
+
- RelaxNG grammars in `grammars/` validate XML output; `relaton-jis.rng` defines JIS-specific elements (DocumentType, structuredidentifier, stagename)
|
|
45
|
+
- Tests validate XML against these grammars using ruby-jing
|
|
46
|
+
|
|
47
|
+
### Test Patterns
|
|
48
|
+
|
|
49
|
+
- **Round-trip tests**: YAML → Object → YAML and XML → Object → XML (fixtures in `spec/fixtures/`)
|
|
50
|
+
- **Schema validation**: Jing validates generated XML against RelaxNG grammars
|
|
51
|
+
- **HTTP mocking**: VCR cassettes (`spec/vcr_cassettes/`) record external HTTP interactions; WebMock disables real network calls in tests
|
|
52
|
+
|
|
53
|
+
## Code Conventions
|
|
54
|
+
|
|
55
|
+
- All files use `# frozen_string_literal: true`
|
|
56
|
+
- Linting follows [Ribose OSS style](https://github.com/riboseinc/oss-guides); target Ruby 3.1
|
|
57
|
+
- Ruby >= 3.1.0 required
|
data/README.adoc
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
=
|
|
1
|
+
= Relaton::Jis: retrieve JIS Standards for bibliographic use using the BibliographicItem model
|
|
2
2
|
|
|
3
3
|
image:https://img.shields.io/gem/v/relaton-jis.svg["Gem Version", link="https://rubygems.org/gems/relaton-jis"]
|
|
4
4
|
image:https://github.com/relaton/relaton-jis/workflows/macos/badge.svg["Build Status, link="https://github.com/relaton/relaton-jis/actions?workflow=rake"]
|
|
@@ -8,7 +8,7 @@ image:https://img.shields.io/github/commits-since/relaton/relaton-jis/latest.svg
|
|
|
8
8
|
|
|
9
9
|
RelatonJis is a Ruby gem that implements the https://github.com/metanorma/metanorma-model-iso#iso-bibliographic-item[IsoBibliographicItem model].
|
|
10
10
|
|
|
11
|
-
You can use it to retrieve metadata of JIS Standards from https://webdesk.jsa.or.jp, and access such metadata through the `
|
|
11
|
+
You can use it to retrieve metadata of JIS Standards from https://webdesk.jsa.or.jp, and access such metadata through the `Relaton::Jis::ItemData` object.
|
|
12
12
|
|
|
13
13
|
== Installation
|
|
14
14
|
|
|
@@ -26,46 +26,43 @@ If bundler is not being used to manage dependencies, install the gem by executin
|
|
|
26
26
|
|
|
27
27
|
[source,ruby]
|
|
28
28
|
----
|
|
29
|
-
require '
|
|
29
|
+
require 'relaton/jis'
|
|
30
30
|
=> true
|
|
31
31
|
|
|
32
|
-
hit_collection =
|
|
33
|
-
=> <
|
|
32
|
+
hit_collection = Relaton::Jis::Bibliography.search("JIS X 0208")
|
|
33
|
+
=> <Relaton::Jis::HitCollection:0x000000000014a0 @ref=JIS X 0208 @fetched=false>
|
|
34
34
|
|
|
35
35
|
hit_collection.first
|
|
36
|
-
=> <
|
|
36
|
+
=> <Relaton::Jis::Hit:0x00000000001a20 @reference="JIS X 0208" @fetched="false" @docidentifier="">
|
|
37
37
|
|
|
38
|
-
item = hit_collection[
|
|
39
|
-
=> #<
|
|
38
|
+
item = hit_collection[1].item
|
|
39
|
+
=> #<Relaton::Jis::ItemData:0x000000012a880790
|
|
40
40
|
...
|
|
41
41
|
|
|
42
|
-
item.docidentifier
|
|
43
|
-
=>
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
@scope=nil,
|
|
48
|
-
@script=nil,
|
|
49
|
-
@type="JIS">]
|
|
42
|
+
item.docidentifier[0].type
|
|
43
|
+
=> "JIS"
|
|
44
|
+
|
|
45
|
+
item.docidentifier[0].content
|
|
46
|
+
=> "JIS X 0208:1997/AMENDMENT 1:2012"
|
|
50
47
|
----
|
|
51
48
|
|
|
52
49
|
=== Fetch document by reference and year
|
|
53
50
|
|
|
54
51
|
[source,ruby]
|
|
55
52
|
----
|
|
56
|
-
item =
|
|
57
|
-
[relaton-jis] (JIS X 0208:1997) Fetching from webdesk.jsa.or.jp ...
|
|
58
|
-
[relaton-jis] (JIS X 0208:1997) Found: `JIS X 0208:1997`
|
|
59
|
-
=> #<
|
|
53
|
+
item = Relaton::Jis::Bibliography.get "JIS X 0208:1997"
|
|
54
|
+
[relaton-jis] INFO: (JIS X 0208:1997) Fetching from webdesk.jsa.or.jp ...
|
|
55
|
+
[relaton-jis] INFO: (JIS X 0208:1997) Found: `JIS X 0208:1997`
|
|
56
|
+
=> #<Relaton::Jis::ItemData:0x000000012c0f2440
|
|
60
57
|
...
|
|
61
58
|
|
|
62
|
-
item =
|
|
63
|
-
[relaton-jis] (JIS X 0208) Fetching
|
|
64
|
-
[relaton-jis] (JIS X 0208) Found: `JIS X 0208:1997`
|
|
65
|
-
=> #<
|
|
59
|
+
item = Relaton::Jis::Bibliography.get "JIS X 0208", "1997"
|
|
60
|
+
[relaton-jis] INFO: (JIS X 0208) Fetching from webdesk.jsa.or.jp ...
|
|
61
|
+
[relaton-jis] INFO: (JIS X 0208) Found: `JIS X 0208:1997`
|
|
62
|
+
=> #<Relaton::Jis::ItemData:0x000000012c1d7810
|
|
66
63
|
...
|
|
67
64
|
|
|
68
|
-
item.docidentifier[0].
|
|
65
|
+
item.docidentifier[0].content
|
|
69
66
|
=> "JIS X 0208:1997"
|
|
70
67
|
----
|
|
71
68
|
|
|
@@ -73,23 +70,23 @@ item.docidentifier[0].id
|
|
|
73
70
|
|
|
74
71
|
[source,ruby]
|
|
75
72
|
----
|
|
76
|
-
item =
|
|
77
|
-
[relaton-jis] (JIS B 0060 (all parts)) Fetching from webdesk.jsa.or.jp ...
|
|
78
|
-
[relaton-jis] (JIS B 0060 (all parts)) Found: `JIS B 0060 (all parts)`
|
|
79
|
-
=> #<
|
|
73
|
+
item = Relaton::Jis::Bibliography.get "JIS B 0060 (all parts)"
|
|
74
|
+
[relaton-jis] INFO: (JIS B 0060 (all parts)) Fetching from webdesk.jsa.or.jp ...
|
|
75
|
+
[relaton-jis] INFO: (JIS B 0060 (all parts)) Found: `JIS B 0060 (all parts)`
|
|
76
|
+
=> #<Relaton::Jis::ItemData:0x00000001273c8840
|
|
80
77
|
...
|
|
81
78
|
|
|
82
|
-
item.docidentifier
|
|
83
|
-
=>
|
|
79
|
+
item.docidentifier[0].content
|
|
80
|
+
=> "JIS B 0060 (all parts)"
|
|
84
81
|
|
|
85
|
-
item =
|
|
86
|
-
[relaton-jis] (JIS B 0060 (規格群)) Fetching from webdesk.jsa.or.jp ...
|
|
87
|
-
[relaton-jis] (JIS B 0060 (規格群)) Found: `JIS B 0060 (all parts)`
|
|
88
|
-
=> #<
|
|
82
|
+
item = Relaton::Jis::Bibliography.get "JIS B 0060 (規格群)"
|
|
83
|
+
[relaton-jis] INFO: (JIS B 0060 (規格群)) Fetching from webdesk.jsa.or.jp ...
|
|
84
|
+
[relaton-jis] INFO: (JIS B 0060 (規格群)) Found: `JIS B 0060 (all parts)`
|
|
85
|
+
=> #<Relaton::Jis::ItemData:0x000000012cb367d0
|
|
89
86
|
...
|
|
90
87
|
|
|
91
|
-
item.docidentifier
|
|
92
|
-
=>
|
|
88
|
+
item.docidentifier[0].content
|
|
89
|
+
=> "JIS B 0060 (all parts)"
|
|
93
90
|
----
|
|
94
91
|
|
|
95
92
|
=== XML serialization
|
|
@@ -101,59 +98,55 @@ Possible options:
|
|
|
101
98
|
[source,ruby]
|
|
102
99
|
----
|
|
103
100
|
item.to_xml
|
|
104
|
-
=> "<bibitem id="
|
|
105
|
-
<fetched>
|
|
106
|
-
<title
|
|
107
|
-
<title
|
|
101
|
+
=> "<bibitem id="JISB006012015" type="standard" schema-version="v1.4.1">
|
|
102
|
+
<fetched>2026-02-25</fetched>
|
|
103
|
+
<title language="ja" script="Jpan">デジタル製品技術文書情報―第1部:総則</title>
|
|
104
|
+
<title language="en" script="Latn">Digital technical product documentation (DTPD) -- Part 1: General code of practices</title>
|
|
105
|
+
<uri type="src">https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+B+0060-1%3A2015</uri>
|
|
106
|
+
<docidentifier type="JIS" primary="true">JIS B 0060 (all parts)</docidentifier>
|
|
108
107
|
...
|
|
109
108
|
</bibitem>"
|
|
110
109
|
|
|
111
110
|
item.to_xml bibdata: true
|
|
112
|
-
=> "<bibdata type="standard" schema-version="v1.
|
|
113
|
-
<fetched>
|
|
114
|
-
<title
|
|
115
|
-
<title
|
|
111
|
+
=> "<bibdata type="standard" schema-version="v1.4.1">
|
|
112
|
+
<fetched>2026-02-25</fetched>
|
|
113
|
+
<title language="ja" script="Jpan">デジタル製品技術文書情報―第1部:総則</title>
|
|
114
|
+
<title language="en" script="Latn">Digital technical product documentation (DTPD) -- Part 1: General code of practices</title>
|
|
115
|
+
<uri type="src">https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+B+0060-1%3A2015</uri>
|
|
116
|
+
<docidentifier type="JIS" primary="true">JIS B 0060 (all parts)</docidentifier>
|
|
116
117
|
...
|
|
117
118
|
<ext schema-version="v0.0.1">
|
|
118
|
-
<doctype>standard</doctype>
|
|
119
|
-
<
|
|
120
|
-
<technical-committee>一般財団法人 日本規格協会</technical-committee>
|
|
121
|
-
</editorialgroup>
|
|
119
|
+
<doctype>japanese-industrial-standard</doctype>
|
|
120
|
+
<flavor>jis</flavor>
|
|
122
121
|
<ics>
|
|
123
|
-
<code>
|
|
124
|
-
<text>
|
|
122
|
+
<code>01.110</code>
|
|
123
|
+
<text>Technical product documentation</text>
|
|
125
124
|
</ics>
|
|
126
125
|
<structuredidentifier type="JIS">
|
|
127
|
-
<project-number>
|
|
126
|
+
<project-number>B0060 (all parts)</project-number>
|
|
128
127
|
</structuredidentifier>
|
|
129
128
|
</ext>
|
|
130
129
|
</bibdata>"
|
|
131
130
|
----
|
|
132
131
|
|
|
133
|
-
=== Typed links
|
|
132
|
+
=== Typed source links
|
|
134
133
|
|
|
135
|
-
Each JIS document has `src` type link and optional `pdf`.
|
|
134
|
+
Each JIS document has `src` type source link and optional `pdf`.
|
|
136
135
|
|
|
137
136
|
[source,ruby]
|
|
138
137
|
----
|
|
139
|
-
item.
|
|
140
|
-
=>
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
@type="src">,
|
|
145
|
-
#<RelatonBib::TypedUri:0x00007fe436a60ed8
|
|
146
|
-
@content=#<Addressable::URI:0xc634 URI:https://webdesk.jsa.or.jp/preview/pre_jis_x_00208_000_000_1997_j_pr11_i4.pdf>,
|
|
147
|
-
@language=nil,
|
|
148
|
-
@script=nil,
|
|
149
|
-
@type="pdf">]
|
|
138
|
+
item.source[0].type
|
|
139
|
+
=> "src"
|
|
140
|
+
|
|
141
|
+
item.source[0].content
|
|
142
|
+
=> "https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+B+0060-1%3A2015"
|
|
150
143
|
----
|
|
151
144
|
|
|
152
145
|
=== Fetch data
|
|
153
146
|
|
|
154
147
|
This gem scrapes the https://webdesk.jsa.or.jp/books/W11M0270 pages to fetch the JIS Standards metadata. By default the data is saved in the `./data` folder in YAML format.
|
|
155
148
|
|
|
156
|
-
The method `
|
|
149
|
+
The method `Relaton::Jis::DataFetcher.fetch(output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
|
|
157
150
|
Arguments:
|
|
158
151
|
|
|
159
152
|
- `output` - folder to save documents (default './data').
|
|
@@ -161,9 +154,11 @@ Arguments:
|
|
|
161
154
|
|
|
162
155
|
[source,ruby]
|
|
163
156
|
----
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
157
|
+
require "relaton/jis/data_fetcher"
|
|
158
|
+
|
|
159
|
+
Relaton::Jis::DataFetcher.fetch
|
|
160
|
+
Started at: 2024-09-27 17:49:40 -0400
|
|
161
|
+
Done in: 3031 sec.
|
|
167
162
|
=> nil
|
|
168
163
|
----
|
|
169
164
|
|