relaton-jis 1.20.0 → 2.0.0.pre.alpha.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +1 -1
  3. data/CLAUDE.md +57 -0
  4. data/README.adoc +64 -69
  5. data/grammars/basicdoc.rng +1559 -671
  6. data/grammars/biblio-standoc.rng +107 -46
  7. data/grammars/biblio.rng +1010 -375
  8. data/grammars/relaton-jis.rng +5 -54
  9. data/lib/relaton/jis/bibdata.rb +7 -0
  10. data/lib/relaton/jis/bibitem.rb +7 -0
  11. data/lib/relaton/jis/bibliography.rb +70 -0
  12. data/lib/relaton/jis/data_fetcher.rb +161 -0
  13. data/lib/relaton/jis/docidentifier.rb +21 -0
  14. data/lib/relaton/jis/doctype.rb +9 -0
  15. data/lib/relaton/jis/ext.rb +16 -0
  16. data/lib/relaton/jis/hit.rb +76 -0
  17. data/lib/relaton/jis/hit_collection.rb +111 -0
  18. data/lib/relaton/jis/item.rb +26 -0
  19. data/lib/relaton/jis/item_base.rb +11 -0
  20. data/lib/relaton/jis/item_data.rb +8 -0
  21. data/lib/relaton/jis/processor.rb +55 -0
  22. data/lib/relaton/jis/relation.rb +11 -0
  23. data/lib/relaton/jis/scraper.rb +203 -0
  24. data/lib/relaton/jis/structured_identifier.rb +13 -0
  25. data/lib/relaton/jis/util.rb +8 -0
  26. data/lib/relaton/jis/version.rb +7 -0
  27. data/lib/relaton/jis.rb +30 -0
  28. data/relaton_jis.gemspec +6 -6
  29. metadata +29 -25
  30. data/lib/relaton_jis/bibliographic_item.rb +0 -12
  31. data/lib/relaton_jis/bibliography.rb +0 -65
  32. data/lib/relaton_jis/data_fetcher.rb +0 -158
  33. data/lib/relaton_jis/document_type.rb +0 -16
  34. data/lib/relaton_jis/hash_converter.rb +0 -16
  35. data/lib/relaton_jis/hit.rb +0 -57
  36. data/lib/relaton_jis/hit_collection.rb +0 -100
  37. data/lib/relaton_jis/processor.rb +0 -61
  38. data/lib/relaton_jis/scraper.rb +0 -157
  39. data/lib/relaton_jis/util.rb +0 -6
  40. data/lib/relaton_jis/version.rb +0 -5
  41. data/lib/relaton_jis/xml_parser.rb +0 -17
  42. data/lib/relaton_jis.rb +0 -29
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 89b060e7e705053054c3f0e3ff3116e054b14137c71c6a42068e947f31f9ee79
4
- data.tar.gz: 83395144a3d8f3c3a50968fe0e6eee48a22ee61d8f9d5bccdab13d9579b8eed1
3
+ metadata.gz: eb4b54d49f314da1369dec1e4446b108ddd68b92555c88c8951233fd7e6b31dc
4
+ data.tar.gz: 2660c24bf8a0916a28010eec9b434953233076bb69775c9852473780f2317373
5
5
  SHA512:
6
- metadata.gz: 66ca9ef9f044226d3dac22455e15a90b96c75adac7a926a9a204ae7ecb261cfd2122910712cd62f1b83a7b13853092050f2cc36db4041664945ce7dcebfc3af5
7
- data.tar.gz: 49262a08e8dbafc224d66634910134f2b604307819a8d1676885fa21dcaa17debab24464d367efee4a7bfeb6e085dcfdcdd23f1e30ef32b4511031f75c79656f
6
+ metadata.gz: aa98951e0c115564f41f81a2de3968617a2699559f4e8a38cc0eaebd6bfeb2feeee16222d01875db30dbcde676a2220671ccdd2709ab4043559874a11470ac9b
7
+ data.tar.gz: 19c4b2f8e43fb87d8f58ff759d3af106b8b8ca3f5730dbffcfd8a089b1f2d40690a22862df507401313a3e1bd4c3c46941e2aa37ff32005b398268a4f976d5ed
data/.rubocop.yml CHANGED
@@ -7,6 +7,6 @@ require: rubocop-rails
7
7
  inherit_from:
8
8
  - https://raw.githubusercontent.com/riboseinc/oss-guides/master/ci/rubocop.yml
9
9
  AllCops:
10
- TargetRubyVersion: 2.7
10
+ TargetRubyVersion: 3.2
11
11
  Rails:
12
12
  Enabled: false
data/CLAUDE.md ADDED
@@ -0,0 +1,57 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ relaton-jis is a Ruby gem that retrieves Japanese Industrial Standards (JIS) metadata from webdesk.jsa.or.jp and models them as Relaton bibliographic items with XML/YAML serialization and RelaxNG schema validation.
8
+
9
+ ## Commands
10
+
11
+ - `bundle exec rake spec` — run all tests
12
+ - `bundle exec rspec spec/relaton/jis/item_spec.rb` — run a single test file
13
+ - `bundle exec rspec spec/relaton/jis/item_spec.rb:15` — run a single test by line number
14
+ - `bundle exec rake rubocop` — lint
15
+ - `bundle exec rake` — default task (runs specs)
16
+
17
+ ## Architecture
18
+
19
+ The gem is currently on the `lutaml-integration` branch, a major refactoring from the old `RelatonJis::` namespace to `Relaton::Jis::` using lutaml-model for data modeling.
20
+
21
+ ### Class Hierarchy
22
+
23
+ All classes live under `lib/relaton/jis/`:
24
+
25
+ - **`Item`** (extends `Iso::Item`) — core bibliographic item; defines the `Ext` attribute for JIS extensions; uses model `Bib::ItemData`
26
+ - **`Bibitem`** (extends `Item`, includes `Bib::BibitemShared`) — XML bibitem serialization
27
+ - **`Bibdata`** (extends `Item`, includes `Bib::BibdataShared`) — XML bibdata wrapper
28
+ - **`Ext`** (extends `Iso::Ext`) — JIS-specific extension data (schema_version, doctype)
29
+ - **`Doctype`** (extends `Bib::Doctype`) — allowed types: `japanese-industrial-standard`, `technical-report`, `technical-specification`, `amendment`
30
+ - **`Scraper`** — scrapes individual JIS document pages from webdesk.jsa.or.jp; returns `Bib::ItemData`; editorial group is modeled as a contributor with `Bib::Subdivision` (not `EditorialGroup`)
31
+ - **`DataFetcher`** (extends `Core::DataFetcher`) — bulk-fetches all JIS documents via threaded scraping; implements `to_yaml`/`to_xml`/`to_bibxml` for serialization dispatch; loaded on-demand via `require "relaton/jis/data_fetcher"`
32
+ - **`Processor`** (extends `Core::Processor`) — Relaton processor registration; provides `get`, `fetch_data`, `from_xml`, `from_yaml`, `grammar_hash`; prefix `JIS`, defaultprefix `^(JIS|TR)\s`
33
+ - **`Util`** (includes `Relaton::Bib::Util`) — logging with PROGNAME "relaton-jis"
34
+
35
+ ### Key Dependencies
36
+
37
+ - **relaton-iso / relaton-bib** — parent bibliographic item models this gem extends
38
+ - **lutaml-model** — DSL for `attribute` definitions and serialization
39
+ - **mechanize** — web scraping from JSA website
40
+
41
+ ### Serialization & Validation
42
+
43
+ - YAML and XML round-trip serialization via lutaml-model
44
+ - RelaxNG grammars in `grammars/` validate XML output; `relaton-jis.rng` defines JIS-specific elements (DocumentType, structuredidentifier, stagename)
45
+ - Tests validate XML against these grammars using ruby-jing
46
+
47
+ ### Test Patterns
48
+
49
+ - **Round-trip tests**: YAML → Object → YAML and XML → Object → XML (fixtures in `spec/fixtures/`)
50
+ - **Schema validation**: Jing validates generated XML against RelaxNG grammars
51
+ - **HTTP mocking**: VCR cassettes (`spec/vcr_cassettes/`) record external HTTP interactions; WebMock disables real network calls in tests
52
+
53
+ ## Code Conventions
54
+
55
+ - All files use `# frozen_string_literal: true`
56
+ - Linting follows [Ribose OSS style](https://github.com/riboseinc/oss-guides); target Ruby 3.1
57
+ - Ruby >= 3.1.0 required
data/README.adoc CHANGED
@@ -1,4 +1,4 @@
1
- = RelatonJis: retrieve JIS Standards for bibliographic use using the BibliographicItem model
1
+ = Relaton::Jis: retrieve JIS Standards for bibliographic use using the BibliographicItem model
2
2
 
3
3
  image:https://img.shields.io/gem/v/relaton-jis.svg["Gem Version", link="https://rubygems.org/gems/relaton-jis"]
4
4
  image:https://github.com/relaton/relaton-jis/workflows/macos/badge.svg["Build Status, link="https://github.com/relaton/relaton-jis/actions?workflow=rake"]
@@ -8,7 +8,7 @@ image:https://img.shields.io/github/commits-since/relaton/relaton-jis/latest.svg
8
8
 
9
9
  RelatonJis is a Ruby gem that implements the https://github.com/metanorma/metanorma-model-iso#iso-bibliographic-item[IsoBibliographicItem model].
10
10
 
11
- You can use it to retrieve metadata of JIS Standards from https://webdesk.jsa.or.jp, and access such metadata through the `RelatonJis::BibliographicItem` object.
11
+ You can use it to retrieve metadata of JIS Standards from https://webdesk.jsa.or.jp, and access such metadata through the `Relaton::Jis::ItemData` object.
12
12
 
13
13
  == Installation
14
14
 
@@ -26,46 +26,43 @@ If bundler is not being used to manage dependencies, install the gem by executin
26
26
 
27
27
  [source,ruby]
28
28
  ----
29
- require 'relaton_jis'
29
+ require 'relaton/jis'
30
30
  => true
31
31
 
32
- hit_collection = RelatonJis::Bibliography.search("JIS X 0208")
33
- => <RelatonJis::HitCollection:0x00000000018858 @ref=JIS X 0208 @fetched=false>
32
+ hit_collection = Relaton::Jis::Bibliography.search("JIS X 0208")
33
+ => <Relaton::Jis::HitCollection:0x000000000014a0 @ref=JIS X 0208 @fetched=false>
34
34
 
35
35
  hit_collection.first
36
- => <RelatonJis::Hit:0x00000000018880 @text="JIS X 0208" @fetched="false" @fullIdentifier="" @title="">
36
+ => <Relaton::Jis::Hit:0x00000000001a20 @reference="JIS X 0208" @fetched="false" @docidentifier="">
37
37
 
38
- item = hit_collection[2].fetch
39
- => #<RelatonJis::BibliographicItem:0x00007fe4564a7580
38
+ item = hit_collection[1].item
39
+ => #<Relaton::Jis::ItemData:0x000000012a880790
40
40
  ...
41
41
 
42
- item.docidentifier
43
- => [#<RelatonBib::DocumentIdentifier:0x00007fe46625c518
44
- @id="JIS X 0208:1997/AMENDMENT 1:2012",
45
- @language=nil,
46
- @primary=true,
47
- @scope=nil,
48
- @script=nil,
49
- @type="JIS">]
42
+ item.docidentifier[0].type
43
+ => "JIS"
44
+
45
+ item.docidentifier[0].content
46
+ => "JIS X 0208:1997/AMENDMENT 1:2012"
50
47
  ----
51
48
 
52
49
  === Fetch document by reference and year
53
50
 
54
51
  [source,ruby]
55
52
  ----
56
- item = RelatonJis::Bibliography.get "JIS X 0208:1997"
57
- [relaton-jis] (JIS X 0208:1997) Fetching from webdesk.jsa.or.jp ...
58
- [relaton-jis] (JIS X 0208:1997) Found: `JIS X 0208:1997`
59
- => #<RelatonJis::BibliographicItem:0x00007fe4478ecc08
53
+ item = Relaton::Jis::Bibliography.get "JIS X 0208:1997"
54
+ [relaton-jis] INFO: (JIS X 0208:1997) Fetching from webdesk.jsa.or.jp ...
55
+ [relaton-jis] INFO: (JIS X 0208:1997) Found: `JIS X 0208:1997`
56
+ => #<Relaton::Jis::ItemData:0x000000012c0f2440
60
57
  ...
61
58
 
62
- item = RelatonJis::Bibliography.get "JIS X 0208", "1997"
63
- [relaton-jis] (JIS X 0208) Fetching from webdesk.jsa.or.jp ...
64
- [relaton-jis] (JIS X 0208) Found: `JIS X 0208:1997`
65
- => #<RelatonJis::BibliographicItem:0x00007fe436b49d90
59
+ item = Relaton::Jis::Bibliography.get "JIS X 0208", "1997"
60
+ [relaton-jis] INFO: (JIS X 0208) Fetching from webdesk.jsa.or.jp ...
61
+ [relaton-jis] INFO: (JIS X 0208) Found: `JIS X 0208:1997`
62
+ => #<Relaton::Jis::ItemData:0x000000012c1d7810
66
63
  ...
67
64
 
68
- item.docidentifier[0].id
65
+ item.docidentifier[0].content
69
66
  => "JIS X 0208:1997"
70
67
  ----
71
68
 
@@ -73,23 +70,23 @@ item.docidentifier[0].id
73
70
 
74
71
  [source,ruby]
75
72
  ----
76
- item = RelatonJis::Bibliography.get "JIS B 0060 (all parts)"
77
- [relaton-jis] (JIS B 0060 (all parts)) Fetching from webdesk.jsa.or.jp ...
78
- [relaton-jis] (JIS B 0060 (all parts)) Found: `JIS B 0060 (all parts)`
79
- => #<RelatonJis::BibliographicItem:0x000000010c3e2300
73
+ item = Relaton::Jis::Bibliography.get "JIS B 0060 (all parts)"
74
+ [relaton-jis] INFO: (JIS B 0060 (all parts)) Fetching from webdesk.jsa.or.jp ...
75
+ [relaton-jis] INFO: (JIS B 0060 (all parts)) Found: `JIS B 0060 (all parts)`
76
+ => #<Relaton::Jis::ItemData:0x00000001273c8840
80
77
  ...
81
78
 
82
- item.docidentifier
83
- => [#<RelatonBib::DocumentIdentifier:0x000000010c5905f8 @id="JIS B 0060 (all parts)", @language=nil, @primary=true, @scope=nil, @script=nil, @type="JIS">]
79
+ item.docidentifier[0].content
80
+ => "JIS B 0060 (all parts)"
84
81
 
85
- item = RelatonJis::Bibliography.get "JIS B 0060 (規格群)"
86
- [relaton-jis] (JIS B 0060 (規格群)) Fetching from webdesk.jsa.or.jp ...
87
- [relaton-jis] (JIS B 0060 (規格群)) Found: `JIS B 0060 (all parts)`
88
- => #<RelatonJis::BibliographicItem:0x000000010c3ceb20
82
+ item = Relaton::Jis::Bibliography.get "JIS B 0060 (規格群)"
83
+ [relaton-jis] INFO: (JIS B 0060 (規格群)) Fetching from webdesk.jsa.or.jp ...
84
+ [relaton-jis] INFO: (JIS B 0060 (規格群)) Found: `JIS B 0060 (all parts)`
85
+ => #<Relaton::Jis::ItemData:0x000000012cb367d0
89
86
  ...
90
87
 
91
- item.docidentifier
92
- => [#<RelatonBib::DocumentIdentifier:0x000000010c8d9b10 @id="JIS B 0060 (all parts)", @language=nil, @primary=true, @scope=nil, @script=nil, @type="JIS">]
88
+ item.docidentifier[0].content
89
+ => "JIS B 0060 (all parts)"
93
90
  ----
94
91
 
95
92
  === XML serialization
@@ -101,59 +98,55 @@ Possible options:
101
98
  [source,ruby]
102
99
  ----
103
100
  item.to_xml
104
- => "<bibitem id="JISX0208-1997" type="standard" schema-version="v1.2.9">
105
- <fetched>2023-03-18</fetched>
106
- <title format="text/plain" language="ja" script="Jpan">7ビット及び8ビットの2バイト情報交換用符号化漢字集合</title>
107
- <title format="text/plain" language="en" script="Lant">7-bit and 8-bit double byte coded KANJI sets for information interchange</title>
101
+ => "<bibitem id="JISB006012015" type="standard" schema-version="v1.4.1">
102
+ <fetched>2026-02-25</fetched>
103
+ <title language="ja" script="Jpan">デジタル製品技術文書情報―第1部:総則</title>
104
+ <title language="en" script="Latn">Digital technical product documentation (DTPD) -- Part 1: General code of practices</title>
105
+ <uri type="src">https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+B+0060-1%3A2015</uri>
106
+ <docidentifier type="JIS" primary="true">JIS B 0060 (all parts)</docidentifier>
108
107
  ...
109
108
  </bibitem>"
110
109
 
111
110
  item.to_xml bibdata: true
112
- => "<bibdata type="standard" schema-version="v1.2.9">
113
- <fetched>2023-03-18</fetched>
114
- <title format="text/plain" language="ja" script="Jpan">7ビット及び8ビットの2バイト情報交換用符号化漢字集合</title>
115
- <title format="text/plain" language="en" script="Lant">7-bit and 8-bit double byte coded KANJI sets for information interchange</title>
111
+ => "<bibdata type="standard" schema-version="v1.4.1">
112
+ <fetched>2026-02-25</fetched>
113
+ <title language="ja" script="Jpan">デジタル製品技術文書情報―第1部:総則</title>
114
+ <title language="en" script="Latn">Digital technical product documentation (DTPD) -- Part 1: General code of practices</title>
115
+ <uri type="src">https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+B+0060-1%3A2015</uri>
116
+ <docidentifier type="JIS" primary="true">JIS B 0060 (all parts)</docidentifier>
116
117
  ...
117
118
  <ext schema-version="v0.0.1">
118
- <doctype>standard</doctype>
119
- <editorialgroup>
120
- <technical-committee>一般財団法人 日本規格協会</technical-committee>
121
- </editorialgroup>
119
+ <doctype>japanese-industrial-standard</doctype>
120
+ <flavor>jis</flavor>
122
121
  <ics>
123
- <code>35.040</code>
124
- <text>Information coding</text>
122
+ <code>01.110</code>
123
+ <text>Technical product documentation</text>
125
124
  </ics>
126
125
  <structuredidentifier type="JIS">
127
- <project-number>X0208</project-number>
126
+ <project-number>B0060 (all parts)</project-number>
128
127
  </structuredidentifier>
129
128
  </ext>
130
129
  </bibdata>"
131
130
  ----
132
131
 
133
- === Typed links
132
+ === Typed source links
134
133
 
135
- Each JIS document has `src` type link and optional `pdf`.
134
+ Each JIS document has `src` type source link and optional `pdf`.
136
135
 
137
136
  [source,ruby]
138
137
  ----
139
- item.link
140
- => [#<RelatonBib::TypedUri:0x00007fe436a626c0
141
- @content=#<Addressable::URI:0xc620 URI:https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+X+0208%3A1997>,
142
- @language=nil,
143
- @script=nil,
144
- @type="src">,
145
- #<RelatonBib::TypedUri:0x00007fe436a60ed8
146
- @content=#<Addressable::URI:0xc634 URI:https://webdesk.jsa.or.jp/preview/pre_jis_x_00208_000_000_1997_j_pr11_i4.pdf>,
147
- @language=nil,
148
- @script=nil,
149
- @type="pdf">]
138
+ item.source[0].type
139
+ => "src"
140
+
141
+ item.source[0].content
142
+ => "https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+B+0060-1%3A2015"
150
143
  ----
151
144
 
152
145
  === Fetch data
153
146
 
154
147
  This gem scrapes the https://webdesk.jsa.or.jp/books/W11M0270 pages to fetch the JIS Standards metadata. By default the data is saved in the `./data` folder in YAML format.
155
148
 
156
- The method `RelatonJis::DataFetcher.fetch(output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
149
+ The method `Relaton::Jis::DataFetcher.fetch(output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
157
150
  Arguments:
158
151
 
159
152
  - `output` - folder to save documents (default './data').
@@ -161,9 +154,11 @@ Arguments:
161
154
 
162
155
  [source,ruby]
163
156
  ----
164
- RelatonJis::DataFetcher.fetch
165
- Start fetching JIS data at 2024-09-27 17:49:40 -0400
166
- Fetching JIS data finished at 2024-09-27 18:40:11 -0400. It took 3031.0 seconds.
157
+ require "relaton/jis/data_fetcher"
158
+
159
+ Relaton::Jis::DataFetcher.fetch
160
+ Started at: 2024-09-27 17:49:40 -0400
161
+ Done in: 3031 sec.
167
162
  => nil
168
163
  ----
169
164