relaton-gb 1.20.1 → 1.20.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 17970eec8510e66fc3597f0f30be04636c3171fdca419e5bd4bd50a1e6e18e3b
4
- data.tar.gz: 7a9419ed8ab99e67dc4093e228f41df1beb2b90bd6dbc2e04876a0c64014e31e
3
+ metadata.gz: 545e535b0db62ecab737c3067cd5b3f8f2104560e104b1dc2cbc7c920ab7238b
4
+ data.tar.gz: 3672fda5a7bb420fb5cc1086b60d46ce3b94f0ae4d9cf7759101f9b67cce2aac
5
5
  SHA512:
6
- metadata.gz: 5aa4709578f13f41b2b58c7813c4e0d939faa57ab67a28f9b0eaa93d24190ac0459f5cc0511c5b40a67e602e9527b335ca09a3de50816ca1979192db2e9cefb0
7
- data.tar.gz: 206fe9818c056380a948fb5b8937d4794b150fbcabab73e27b3feb31842da8adbe9af5ffed5190a57a59b04226d08194c895b1204b38d372a60b09040005885b
6
+ metadata.gz: 74581dbf698e664f97cdfa66c1d4f948f2a14713eeb772ce20838a03f01fba1eb864afe5b6a4153f8fc80a32b30d3893bdbfa713667edd83c193f6bada37e533
7
+ data.tar.gz: 3141ac12e00f11a00a44d4563a48e7aa16b7b762514dc63ca828f104191614c55ec75ab6cf2df7ccc0e860ab50ccc4d42665ce61e98fd49ae51e81d2fcee55e2
data/CLAUDE.md ADDED
@@ -0,0 +1,74 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ relaton-gb is a Ruby gem for searching and fetching Chinese GB (Guobiao) standards bibliographic data. It's part of the Relaton family of gems and scrapes standards from Chinese government websites.
8
+
9
+ ## Common Commands
10
+
11
+ ```bash
12
+ # Install dependencies
13
+ bin/setup
14
+
15
+ # Run all tests
16
+ bundle exec rake spec
17
+
18
+ # Run a single test file
19
+ bundle exec rspec spec/relaton_gb_spec.rb
20
+
21
+ # Run a specific test by line number
22
+ bundle exec rspec spec/relaton_gb_spec.rb:31
23
+
24
+ # Interactive console for experimenting
25
+ bin/console
26
+
27
+ # Lint with RuboCop (uses Ribose OSS style guide)
28
+ bundle exec rubocop
29
+
30
+ # Install gem locally
31
+ bundle exec rake install
32
+ ```
33
+
34
+ ## Architecture
35
+
36
+ ### Entry Point
37
+ `RelatonGb::GbBibliography` is the main API class:
38
+ - `search(text)` - Returns `HitCollection` of search results
39
+ - `get(code, year, opts)` - Fetches a specific standard by identifier
40
+
41
+ ### Scrapers (lib/relaton_gb/)
42
+ Each scraper handles a different standard source:
43
+ - `GbScrapper` - National standards (GB/GJ/GS prefix) from openstd.samr.gov.cn
44
+ - `TScrapper` - Social organization standards (T/XX prefix) from www.ttbz.org.cn
45
+ - `Scrapper` - Common scraping methods shared via `extend`
46
+
47
+ The scrapers use Mechanize for HTTP requests and Nokogiri for HTML parsing.
48
+
49
+ ### Domain Models
50
+ - `GbBibliographicItem` - Main bibliographic item class, extends `RelatonIsoBib::IsoBibliographicItem`
51
+ - `Hit` / `HitCollection` - Search result wrappers with lazy fetching via `hit.fetch`
52
+ - `GbStandardType` - Standard classification (scope, mandate, prefix)
53
+ - `GbTechnicalCommittee` - Technical committee information
54
+
55
+ ### Data Flow
56
+ 1. `GbBibliography.search` routes to appropriate scraper based on standard prefix
57
+ 2. Scraper returns `HitCollection` with basic metadata
58
+ 3. Calling `hit.fetch` scrapes the full document page and returns `GbBibliographicItem`
59
+ 4. `GbBibliographicItem` can serialize to XML, hash, or AsciiBib
60
+
61
+ ## Testing
62
+
63
+ Tests use RSpec with VCR to record/replay HTTP interactions:
64
+ - VCR cassettes stored in `spec/vcr_cassettes/`
65
+ - Cassettes auto-expire after 7 days (`re_record_interval`)
66
+ - XML output validated against RelaxNG schemas in `grammars/`
67
+
68
+ To re-record a VCR cassette, delete the corresponding YAML file and run the test.
69
+
70
+ ## Important Notes
71
+
72
+ - GB standard searches **require the year** in the identifier (e.g., `GB/T 20223-2006`, not `GB/T 20223`)
73
+ - Standard prefixes define the type: GB/GJ/GS = national, T/XX = social organization
74
+ - The `/T` suffix in prefix indicates "recommended" (推荐), `/Z` indicates "guidelines"
@@ -1,8 +1,6 @@
1
1
  # encoding: UTF-8
2
2
  # frozen_string_literal: true
3
3
 
4
- require "open-uri"
5
- require "net/http"
6
4
  require "nokogiri"
7
5
  require "relaton_gb/scrapper"
8
6
  require "relaton_gb/gb_bibliographic_item"
@@ -19,22 +17,24 @@ module RelatonGb
19
17
  # @param text [String]
20
18
  # @return [RelatonGb::HitCollection]
21
19
  def scrape_page(text)
22
- search_html = OpenURI.open_uri(
23
- "http://www.ttbz.org.cn/Home/Standard?searchType=2&key=" \
24
- "#{CGI.escape(text.tr('-', [8212].pack('U')))}",
25
- ).read
26
- header = Nokogiri::HTML search_html
20
+ url = "http://www.ttbz.org.cn/Home/Standard?searchType=2&key=" \
21
+ "#{CGI.escape(text.tr('-', [8212].pack('U')))}"
22
+ doc = agent.get(url)
27
23
  xpath = '//table[contains(@class, "standard_list_table")]/tr/td/a'
28
24
  t_xpath = "../preceding-sibling::td[4]"
29
- hits = header.xpath(xpath).map do |h|
25
+ hits = doc.xpath(xpath).map do |h|
30
26
  docref = h.at(t_xpath).text.gsub(/â\u0080\u0094/, "-")
31
27
  status = h.at("../preceding-sibling::td[1]").text.delete "\r\n"
32
28
  pid = h[:href].sub(%r{/$}, "")
33
29
  Hit.new pid: pid, docref: docref, status: status, scrapper: self
34
30
  end
35
31
  HitCollection.new hits
36
- rescue OpenURI::HTTPError, SocketError, OpenSSL::SSL::SSLError, Net::OpenTimeout
37
- raise RelatonBib::RequestError, "Cannot access http://www.ttbz.org.cn/Home/Standard"
32
+ rescue Mechanize::ResponseCodeError => e
33
+ return nil if e.response_code == "404"
34
+
35
+ raise RelatonBib::RequestError, "Cannot access #{url}: #{e.message}"
36
+ rescue Mechanize::Error => e
37
+ raise RelatonBib::RequestError, "Cannot access #{url}: #{e.message}"
38
38
  end
39
39
  # rubocop:enable Metrics/MethodLength, Metrics/AbcSize
40
40
 
@@ -42,10 +42,14 @@ module RelatonGb
42
42
  # @return [RelatonGb::GbBibliographicItem]
43
43
  def scrape_doc(hit)
44
44
  src = "http://www.ttbz.org.cn#{hit.pid}"
45
- doc = Nokogiri::HTML OpenURI.open_uri(src), nil, Encoding::UTF_8.to_s
45
+ doc = agent.get(src)
46
46
  GbBibliographicItem.new(**scrapped_data(doc, src, hit))
47
- rescue OpenURI::HTTPError, SocketError, OpenSSL::SSL::SSLError, Net::OpenTimeout
48
- raise RelatonBib::RequestError, "Cannot access #{src}"
47
+ rescue Mechanize::Error => e
48
+ raise RelatonBib::RequestError, "Cannot access #{src}: #{e.message}"
49
+ end
50
+
51
+ def agent
52
+ @agent ||= Mechanize.new
49
53
  end
50
54
 
51
55
  private
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module RelatonGb
4
- VERSION = "1.20.1"
4
+ VERSION = "1.20.2"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: relaton-gb
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.20.1
4
+ version: 1.20.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2025-04-24 00:00:00.000000000 Z
11
+ date: 2026-01-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: cnccs
@@ -108,6 +108,7 @@ files:
108
108
  - ".hound.yml"
109
109
  - ".rspec"
110
110
  - ".rubocop.yml"
111
+ - CLAUDE.md
111
112
  - Gemfile
112
113
  - LICENSE.txt
113
114
  - README.adoc