tc211-termbase 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 595681dcffbabcec368f07c06712468fb8d124e4183355bed7d610ed9aac6caf
4
+ data.tar.gz: '017583476a0a5743adedf8cefa49e2b657eb7e6686133b8be5c95d72fce02dc2'
5
+ SHA512:
6
+ metadata.gz: 24ea15ba4f0c8bad6b963d0765f793ff2578dd2c1373e75adcbafacc4e9cafe1e35fdbadbc61a2d2f7d06dc1ec531c759615fc50edc93b66e3383a1ffe14b56d
7
+ data.tar.gz: b7eb57174f742c5afe931232c61415def36d62218deba71fce24d39596d08ddf08c9835b71c9caf6d290c66b06290c0f37cb1d44b6abf178aefa06b7c2564ff4
data/.gitignore ADDED
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.travis.yml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ sudo: false
3
+ language: ruby
4
+ cache: bundler
5
+ rvm:
6
+ - 2.4.3
7
+ before_install: gem install bundler -v 1.17.1
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at ronald.tse@ribose.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in tc211-termbase.gemspec
6
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,63 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ tc211-termbase (0.1.0)
5
+ creek
6
+ iso-639
7
+
8
+ GEM
9
+ remote: https://rubygems.org/
10
+ specs:
11
+ addressable (2.5.2)
12
+ public_suffix (>= 2.0.2, < 4.0)
13
+ creek (2.4.1)
14
+ http (~> 3.0)
15
+ nokogiri (>= 1.7.0)
16
+ rubyzip (>= 1.0.0)
17
+ diff-lcs (1.3)
18
+ domain_name (0.5.20180417)
19
+ unf (>= 0.0.5, < 1.0.0)
20
+ http (3.3.0)
21
+ addressable (~> 2.3)
22
+ http-cookie (~> 1.0)
23
+ http-form_data (~> 2.0)
24
+ http_parser.rb (~> 0.6.0)
25
+ http-cookie (1.0.3)
26
+ domain_name (~> 0.5)
27
+ http-form_data (2.1.1)
28
+ http_parser.rb (0.6.0)
29
+ iso-639 (0.2.8)
30
+ mini_portile2 (2.3.0)
31
+ nokogiri (1.8.5)
32
+ mini_portile2 (~> 2.3.0)
33
+ public_suffix (3.0.3)
34
+ rake (10.5.0)
35
+ rspec (3.8.0)
36
+ rspec-core (~> 3.8.0)
37
+ rspec-expectations (~> 3.8.0)
38
+ rspec-mocks (~> 3.8.0)
39
+ rspec-core (3.8.0)
40
+ rspec-support (~> 3.8.0)
41
+ rspec-expectations (3.8.2)
42
+ diff-lcs (>= 1.2.0, < 2.0)
43
+ rspec-support (~> 3.8.0)
44
+ rspec-mocks (3.8.0)
45
+ diff-lcs (>= 1.2.0, < 2.0)
46
+ rspec-support (~> 3.8.0)
47
+ rspec-support (3.8.0)
48
+ rubyzip (1.2.2)
49
+ unf (0.1.4)
50
+ unf_ext
51
+ unf_ext (0.0.7.5)
52
+
53
+ PLATFORMS
54
+ ruby
55
+
56
+ DEPENDENCIES
57
+ bundler (~> 1.17)
58
+ rake (~> 10.0)
59
+ rspec (~> 3.0)
60
+ tc211-termbase!
61
+
62
+ BUNDLED WITH
63
+ 1.17.1
data/README.adoc ADDED
@@ -0,0 +1,37 @@
1
+ = Tc211::Termbase
2
+
3
+ Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/tc211/termbase`. To experiment with that code, run `bin/console` for an interactive prompt.
4
+
5
+ == Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'tc211-termbase'
11
+ ```
12
+
13
+ And then execute:
14
+
15
+ $ bundle
16
+
17
+ Or install it yourself as:
18
+
19
+ $ gem install tc211-termbase
20
+
21
+ == Usage
22
+
23
+ TODO: Write usage instructions here
24
+
25
+ == Development
26
+
27
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
28
+
29
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
30
+
31
+ == Contributing
32
+
33
+ Bug reports and pull requests are welcome on GitHub at https://github.com/riboseinc/tc211-termbase. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
34
+
35
+ == Code of Conduct
36
+
37
+ Everyone interacting in the Tc211::Termbase project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/riboseinc/tc211-termbase/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "tc211/termbase"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,67 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'creek'
4
+ require 'pp'
5
+ require 'pathname'
6
+ require 'fileutils'
7
+ require_relative '../lib/tc211/termbase.rb'
8
+ # require 'pry'
9
+
10
+ filepath = ARGV[0]
11
+ #'./tc211-termbase.xlsx'
12
+
13
+ if filepath.nil?
14
+ puts 'Error: no filepath given as first argument.'
15
+ exit 1
16
+ end
17
+
18
+ if Pathname.new(filepath).extname != ".xlsx"
19
+ puts 'Error: filepath given must have extension .xlsx.'
20
+ exit 1
21
+ end
22
+
23
+
24
+ workbook = Tc211::Termbase::TermWorkbook.new(filepath)
25
+ workbook.glossary_info.metadata_section.structure
26
+ workbook.glossary_info.metadata_section.attributes
27
+
28
+ languages = {}
29
+
30
+ workbook.languages_supported.map do |lang|
31
+ puts "************** WORKING ON LANGUAGE (#{lang})"
32
+ sheet = workbook.language_sheet(lang)
33
+ termsec = sheet.terms_section
34
+ languages[sheet.language_code] = termsec.terms
35
+ end
36
+
37
+ collection = Tc211::Termbase::ConceptCollection.new
38
+
39
+ languages.each_pair do |lang, terms|
40
+ terms.each do |term|
41
+ collection.add_term(term)
42
+ end
43
+ end
44
+
45
+ # collection[1206].inspect
46
+
47
+ output_dir = Dir.pwd
48
+
49
+ collection.to_file(File.join(output_dir, Pathname.new(filepath).basename.sub_ext(".yaml")))
50
+
51
+ collection_output_dir = File.join(output_dir, "concepts")
52
+
53
+ FileUtils.mkdir_p(collection_output_dir)
54
+
55
+ collection.keys.each do |id|
56
+ collection[id].to_file(File.join(collection_output_dir, "concept-#{id}.yaml"))
57
+ end
58
+
59
+ # french = workbook.language_sheet("French")
60
+ # french.sections[3].structure
61
+ # french.sections[3].terms
62
+
63
+ # english = workbook.language_sheet("English")
64
+ # english.terms_section
65
+ # english.terms_section.terms
66
+
67
+ #pry.binding
@@ -0,0 +1,11 @@
1
+ require "tc211/termbase/version"
2
+
3
+ module Tc211
4
+ module Termbase
5
+ class Error < StandardError; end
6
+ # Your code goes here...
7
+ end
8
+ end
9
+
10
+ require 'tc211/termbase/term_workbook'
11
+ require 'tc211/termbase/concept_collection'
@@ -0,0 +1,35 @@
1
+ module Tc211::Termbase
2
+
3
+ class Concept < Hash
4
+ attr_accessor :id
5
+ attr_accessor :terms
6
+
7
+ def initialize(options={})
8
+ terms = options.delete(:terms) || []
9
+ terms.each do |term|
10
+ add_term(term)
11
+ end
12
+
13
+ options.each_pair do |k,v|
14
+ self.send("#{k}=", v)
15
+ end
16
+ end
17
+
18
+ def add_term(term)
19
+ self[term.language_code] = term
20
+ end
21
+
22
+ def to_hash
23
+ self.inject({}) do |acc, (lang, term)|
24
+ acc.merge!(lang => term.to_hash)
25
+ end
26
+ end
27
+
28
+ def to_file(filename)
29
+ File.open(filename,"w") do |file|
30
+ file.write(to_hash.to_yaml)
31
+ end
32
+ end
33
+
34
+ end
35
+ end
@@ -0,0 +1,32 @@
1
+ require_relative "concept"
2
+
3
+ module Tc211::Termbase
4
+
5
+ class ConceptCollection < Hash
6
+
7
+ def add_term(term)
8
+ if self[term.id]
9
+ self[term.id].add_term(term)
10
+ else
11
+ self[term.id] = Concept.new(
12
+ id: term.id,
13
+ terms: [term]
14
+ )
15
+ end
16
+ end
17
+
18
+ def to_hash
19
+ self.inject({}) do |acc, (id, concept)|
20
+ acc.merge!(id => concept.to_hash)
21
+ end
22
+ end
23
+
24
+ def to_file(filename)
25
+ File.open(filename,"w") do |file|
26
+ file.write(to_hash.to_yaml)
27
+ end
28
+ end
29
+
30
+ end
31
+
32
+ end
@@ -0,0 +1,22 @@
1
+ require_relative "terminology_sheet"
2
+
3
+ module Tc211::Termbase
4
+
5
+ class InformationSheet < TerminologySheet
6
+
7
+ def metadata_section
8
+ sheet_array = @sheet.simple_rows.to_a
9
+ section = MetadataSection.new(sheet_array)
10
+ end
11
+
12
+ def to_hash
13
+ { "glossary" => metadata_section.to_hash }
14
+ end
15
+
16
+ def to_yaml
17
+ to_hash.to_yaml
18
+ end
19
+
20
+ end
21
+
22
+ end
@@ -0,0 +1,87 @@
1
+ require_relative "sheet_section"
2
+
3
+ module Tc211::Termbase
4
+
5
+ class MetadataSection < SheetSection
6
+ attr_accessor :header_row
7
+ attr_accessor :attributes
8
+
9
+ GLOSSARY_HEADER_ROW_MATCH = {
10
+ "A" => [nil, "Item", "A"], # "Arabic" uses "A"
11
+ "C" => ["Data Type"],
12
+ "D" => ["Special Instruction"],
13
+ "E" => ["ISO 19135 Class.attribute"],
14
+ "F" => ["Domain"]
15
+ }
16
+
17
+ GLOSSARY_ROW_KEY_MAP = {
18
+ "A" => "name",
19
+ "B" => "value",
20
+ "C" => "datatype",
21
+ "D" => "special-instruction",
22
+ "E" => "19135-class-attribute",
23
+ "F" => "value-domain"
24
+ }
25
+
26
+ def initialize(rows, options={})
27
+ super
28
+ raise unless self.class.match_header(@rows[0])
29
+ @header_row = @rows[0]
30
+ @body_rows = @rows[1..-1]
31
+ attributes
32
+ self
33
+ end
34
+
35
+ def self.match_header(row)
36
+ puts "row #{row}"
37
+ row.inject(true) do |acc, (key, value)|
38
+ puts "#{key}, #{value}"
39
+ if GLOSSARY_HEADER_ROW_MATCH[key]
40
+ acc && GLOSSARY_HEADER_ROW_MATCH[key].include?(value)
41
+ else
42
+ acc
43
+ end
44
+ end
45
+ end
46
+
47
+ def structure
48
+ GLOSSARY_ROW_KEY_MAP
49
+ end
50
+
51
+ def parse_row(row)
52
+ return nil if row.empty?
53
+ attribute = {}
54
+
55
+ structure.each_pair do |key, value|
56
+ puts "#{key}, #{value}, #{row[key]}"
57
+ attribute_key = value
58
+ attribute_value = row[key]
59
+ next if attribute_value.nil?
60
+ attribute[attribute_key] = attribute_value
61
+ end
62
+
63
+ # TODO: "Chinese" name is empty!
64
+ key = (attribute["name"] || "(empty)").downcase.split(" ").join("-")
65
+
66
+ { key => attribute }
67
+ end
68
+
69
+ def attributes
70
+ return @attributes if @attributes
71
+
72
+ @attributes = {}
73
+ @body_rows.each do |row|
74
+ result = parse_row(row)
75
+ @attributes.merge!(result) if result
76
+ end
77
+ @attributes
78
+ end
79
+
80
+ def to_hash
81
+ {
82
+ "metadata" => attributes
83
+ }
84
+ end
85
+
86
+ end
87
+ end
@@ -0,0 +1,26 @@
1
+
2
+ module Tc211::Termbase
3
+
4
+ class SheetSection
5
+ attr_accessor :sheet_content
6
+
7
+ def initialize(rows, options={})
8
+ # rows is an array of rows!
9
+ raise unless rows.is_a?(Array)
10
+ @rows = rows
11
+ # @has_header = options[:has_header].nil? ? true : options[:has_header]
12
+ self
13
+ end
14
+
15
+ # Abstract method
16
+ def self.match_header(row)
17
+ false
18
+ end
19
+
20
+ def self.identify_type(row)
21
+
22
+ end
23
+
24
+ # TODO
25
+ end
26
+ end
@@ -0,0 +1,136 @@
1
+ module Tc211::Termbase
2
+
3
+ class Term
4
+
5
+ ATTRIBS = %i(
6
+ id term abbrev synonyms alt definition
7
+ country_code
8
+ language_code
9
+ notes examples
10
+ entry_status
11
+ classification
12
+ review_indicator
13
+ authoritative_source
14
+ authoritative_source_similarity
15
+ lineage_source
16
+ lineage_source_similarity
17
+ date_accepted
18
+ date_amended
19
+ review_date
20
+ review_status
21
+ review_type
22
+ review_decision
23
+ review_decision_date
24
+ review_decision_event
25
+ review_decision_notes
26
+ release
27
+ )
28
+
29
+ attr_accessor *ATTRIBS
30
+
31
+ def initialize(options={})
32
+ @examples = []
33
+ @notes = []
34
+
35
+ puts "options #{options.inspect}"
36
+
37
+ options.each_pair do |k, v|
38
+ next unless v
39
+ case k
40
+ when /^example/
41
+ @examples << v
42
+ when /^note/
43
+ @notes << v
44
+ else
45
+ puts "Key #{k}"
46
+ key = k.gsub("-", "_")
47
+ self.send("#{key}=", v)
48
+ end
49
+ end
50
+ self
51
+ end
52
+
53
+ def to_hash
54
+ ATTRIBS.inject({}) do |acc, attrib|
55
+ value = self.send(attrib)
56
+ unless value.nil?
57
+ acc.merge(attrib => value)
58
+ else
59
+ acc
60
+ end
61
+ end
62
+ end
63
+
64
+ # entry-status
65
+ ## Must be one of notValid valid superseded retired
66
+ def entry_status=(value)
67
+ unless %w(notValid valid superseded retired).include?(value)
68
+ value = "notValid"
69
+ end
70
+ @entry_status = value
71
+ end
72
+
73
+ # classification
74
+ ## Must be one of the following: preferred admitted deprecated
75
+ def classification=(value)
76
+ unless %w(preferred admitted deprecated).include?(value)
77
+ value = "preferred"
78
+ end
79
+ @classification = value
80
+ end
81
+
82
+ # review-indicator
83
+ ## Must be one of the following <empty field> Under Review in Source Document",
84
+ def review_indicator=(value)
85
+ unless ["", "Under Review in Source Document"].include?(value)
86
+ value = ""
87
+ end
88
+ @review_indicator = value
89
+ end
90
+
91
+ # authoritative-source-similarity
92
+ # ## Must be one of the following codes: identical = 1 restyled = 2 context added = 3 generalisation = 4 specialisation = 5 unspecified = 6",
93
+ def authoritative_source_similarity=(value)
94
+ unless (1..6).include?(value)
95
+ value = 6
96
+ end
97
+ @authoritative_source_similarity = value
98
+ end
99
+
100
+ # lineage-source-similarity
101
+ # ## Must be one of the following codes: identical = 1 restyled = 2 context added = 3 generalisation = 4 specialisation = 5 unspecified = 6",
102
+ def authoritative_source_similarity=(value)
103
+ unless (1..6).include?(value)
104
+ value = 6
105
+ end
106
+ @authoritative_source_similarity
107
+ end
108
+
109
+ def review_status=(value) ## Must be one of pending tentative final
110
+ unless ["", "pending", "tentative", "final"].include?(value)
111
+ value = ""
112
+ end
113
+ @review_status = value
114
+ end
115
+
116
+ def review_type=(value) ## Must be one of supersession, retirement
117
+ unless ["", "supersession", "retirement"].include?(value)
118
+ value = ""
119
+ end
120
+ @review_type = value
121
+ end
122
+
123
+ def review_decision=(value) ## Must be one of withdrawn, accepted notAccepted
124
+ unless ["", "withdrawn", "accepted", "notAccepted"].include?(value)
125
+ value = ""
126
+ end
127
+ @review_decision = value
128
+ end
129
+
130
+ def retired?
131
+ release >= 0
132
+ end
133
+
134
+ end
135
+
136
+ end
@@ -0,0 +1,53 @@
1
+
2
+ require "creek"
3
+ require "yaml"
4
+ require "pathname"
5
+ require_relative "information_sheet"
6
+ require_relative "terminology_sheet"
7
+
8
+ module Tc211::Termbase
9
+
10
+ class TermWorkbook
11
+ attr_accessor :workbook
12
+ attr_accessor :glossary_info
13
+ attr_accessor :languages
14
+ attr_accessor :filename
15
+
16
+ SPECIAL_SHEETS = [
17
+ "Glossary Information",
18
+ "Character Encoding Spreadsheet"
19
+ ]
20
+
21
+ def initialize(filepath)
22
+ @filename = filepath
23
+ @workbook = Creek::Book.new(filepath)
24
+ @glossary_info = InformationSheet.new(find_sheet_by_name("Glossary Information"))
25
+ @languages = languages_supported
26
+ self
27
+ end
28
+
29
+ def languages_supported
30
+ @workbook.sheets.map(&:name).reject! do |name|
31
+ SPECIAL_SHEETS.include?(name)
32
+ end
33
+ end
34
+
35
+ def language_sheet(lang)
36
+ raise unless @languages.include?(lang)
37
+ TerminologySheet.new(find_sheet_by_name(lang))
38
+ end
39
+
40
+ def find_sheet_by_name(sheet_name)
41
+ @workbook.sheets.detect do |sheet|
42
+ sheet.name == sheet_name
43
+ end
44
+ end
45
+
46
+ def write_glossary_info
47
+ glossary_info_fn = Pathname.new(@filename).sub_ext(".yaml")
48
+ File.open(glossary_info_fn,"w") do |file|
49
+ file.write(glossary_info.to_yaml)
50
+ end
51
+ end
52
+ end
53
+ end
@@ -0,0 +1,79 @@
1
+ require_relative "metadata_section"
2
+ require_relative "terms_section"
3
+ require "iso-639"
4
+
5
+ module Tc211::Termbase
6
+
7
+ class TerminologySheet
8
+ attr_accessor :sheet
9
+
10
+ def initialize(sheet)
11
+ @sheet = sheet
12
+ self
13
+ end
14
+
15
+ def language
16
+ @sheet.name
17
+ end
18
+
19
+ def language_code
20
+ # Hack to make ISO_639 gem work...
21
+ lang = case language
22
+ when "Dutch"
23
+ "Dutch; Flemish"
24
+ when "Spanish"
25
+ "Spanish; Castilian"
26
+ else
27
+ language
28
+ end
29
+ ISO_639.find_by_english_name(lang).alpha3
30
+ rescue
31
+ raise StandardError.new("Failed to find alpha3 code for language: #{lang}")
32
+ end
33
+
34
+ def sections_raw
35
+ # Sections either start with "A" => "Item", or they have empty lines between
36
+ raw_sections = @sheet.simple_rows.to_a
37
+
38
+ raw_sections.reject! do |section|
39
+ section.empty?
40
+ end
41
+
42
+ raw_sections = raw_sections.slice_before do |row|
43
+ row["A"].to_s == "Item" || row["A"].to_s.match(/^ISO 19135 Field/)
44
+ end.to_a
45
+ end
46
+
47
+ def terms_section
48
+ sections
49
+
50
+ sections.detect do |section|
51
+ section.is_a?(TermsSection)
52
+ end
53
+ end
54
+
55
+ def sections
56
+ return @sections if @sections
57
+
58
+ @sections = []
59
+ sections_raw.each_with_index do |x,i|
60
+
61
+ puts "--------- Section #{i} ------"
62
+ section = if MetadataSection.match_header(x[0])
63
+ puts "------ is a MetadataSection"
64
+ puts "rows: #{x.inspect}"
65
+ MetadataSection.new(x)
66
+ else
67
+ puts "------ is a TermsSection"
68
+ puts "rows: #{x.inspect}"
69
+ TermsSection.new(x, {language_code: language_code})
70
+ end
71
+
72
+ @sections << section
73
+ end
74
+
75
+ end
76
+
77
+ end
78
+
79
+ end
@@ -0,0 +1,148 @@
1
+ require_relative "sheet_section"
2
+ require_relative "term"
3
+
4
+ module Tc211::Termbase
5
+
6
+ class TermsSection < SheetSection
7
+ attr_accessor :structure
8
+ attr_accessor :header_row
9
+
10
+ TERM_HEADER_ROW_MATCH = {
11
+ "A" => ["ISO 19135 Field\nRE_RegisterItem.itemIdentifier"],
12
+ "B" => ["ISO 19135 Field\nRE_RegisterItem.name"],
13
+ "C" => ["ISO 19135 Field\nRE_RegisterItem.\nalternativeExpression"],
14
+ "D" => ["Country_Code"],
15
+ # ... We don't need to match all the cells
16
+ }
17
+
18
+ TERM_BODY_COLUMN_MAP = {
19
+ "Term_ID" => "id",
20
+ "Term" => "term",
21
+ "Term [OPERATING LANGUAGE]" => "term",
22
+ "Term_Abbreviation" => "abbrev",
23
+ "Country code" => "country-code",
24
+ "Definition" => "definition",
25
+ "Term [OPERATING LANGUAGE - ALTERNATIVE CHARACTER SET]" => "alt",
26
+ "Term in English" => nil,
27
+ "Entry Status" => "entry-status",
28
+ ## Must be one of 'notValid' 'valid' 'superseded' 'retired'
29
+ "Term Clasification" => "classification",
30
+ ## Must be one of the following 'preferred' 'admitted' 'deprecated'
31
+ "Review Indicator" => "review-indicator",
32
+ ## Must be one of the following <empty field> 'Under Review in Source Document'",
33
+ "Authoritative Source" => "authoritative-source",
34
+ "Similarity to Authoritative Source" => "authoritative-source-similarity",
35
+ ## Must be one of the following codes: 'identical' = 1 'restyled' = 2 'context added' = 3 'generalisation' = 4 'specialisation' = 5 'unspecified' = 6",
36
+ "Lineage Source" => "lineage-source",
37
+ "Similarity to Lineage Source" => "lineage-source-similarity",
38
+ ## Must be one of the following codes: 'identical' = 1 'restyled' = 2 'context added' = 3 'generalisation' = 4 'specialisation' = 5 'unspecified' = 6",
39
+ "Term Synonyms" => "synonyms",
40
+ "Date Accepted" => "date-accepted", # yyyy-mm-dd,
41
+ "Date Amended" => "date-amended", # yyyy-mm-dd,
42
+ "Review Date" => "review-date", # yyyy-mm-dd,
43
+ "Review Status" => "review-status", ## Must be one of 'pending' 'tentative' 'final'",
44
+ "Review Type" => "review-type", ## Must be one of 'supersession', 'retirement'",
45
+ "Review Decision" => "review-decision", ## Must be one of 'withdrawn', 'accepted' 'notAccepted'",
46
+ "Review Decision Date" => "review-decision-date", # yyyy-mm-dd
47
+ "Review Decision Event" => "review-decision-event",
48
+ "Review Decision Notes" => "review-decision-notes",
49
+ "Example_1" => "example-1",
50
+ "Note_1" => "note-1",
51
+ "Example_2" => "example-2",
52
+ "Note_2" => "note-2",
53
+ "Example_3" => "example-3",
54
+ "Note_3" => "note-3",
55
+ "Example_4" => "example-4",
56
+ "Note_4" => "note-4",
57
+ "Example_5" => "example-5",
58
+ "Note_5" => "note-5",
59
+ "Example_6" => "example-6",
60
+ "Note_6" => "note-6",
61
+ "Example_7" => "example-7",
62
+ "Note_7" => "note-7",
63
+ "Example_8" => "example-8",
64
+ "Note_8" => "note-8",
65
+ "Glossary Release" => "release"
66
+ ## Must be one of the following codes 'release1' = 1 'release1_retired' = -1 'release2' = 2 'release2_retired' = -2 etc "
67
+ }
68
+
69
+ def initialize(rows, options={})
70
+ super
71
+ raise StandardError.new("Does not match TermsSection header!") unless self.class.match_header(@rows[0])
72
+ @mapping_rows = @rows[0..1]
73
+ @header_row = @rows[2]
74
+ @body_rows = @rows[3..-1]
75
+ @language_code = options.delete(:language_code)
76
+ self
77
+ end
78
+
79
+ def structure
80
+ @structure ||= @header_row.inject({}) do |acc, (key, value)|
81
+ # puts "#{key}, #{value}, #{GLOSSARY_HEADER_TITLES[value]}"
82
+
83
+ # convert whitespace to a single space
84
+ cleaned_value = value.gsub(/\s+/, ' ')
85
+
86
+ matches = TERM_BODY_COLUMN_MAP.map do |key, value|
87
+ puts "key #{key}, value #{value}"
88
+ if cleaned_value[Regexp.new("^#{key}")]
89
+ [key, value]
90
+ end
91
+ end.compact
92
+
93
+ discard, longest_match_key = matches.max_by do |(a, b)|
94
+ a.length
95
+ end
96
+
97
+ # Here we need to skip "Term in English"
98
+ if key && longest_match_key
99
+ acc.merge!({ key => longest_match_key })
100
+ else
101
+ acc
102
+ end
103
+
104
+ end
105
+ end
106
+
107
+ def self.match_header(row)
108
+ # puts "row #{row}"
109
+ row.inject(true) do |acc, (key, value)|
110
+ # puts "#{key}, #{value}"
111
+ if TERM_HEADER_ROW_MATCH[key]
112
+ acc && TERM_HEADER_ROW_MATCH[key].include?(value)
113
+ else
114
+ acc
115
+ end
116
+ end
117
+ end
118
+
119
+ def parse_row(row)
120
+ return nil if row.empty?
121
+ attributes = {}
122
+
123
+ structure.each_pair do |key, value|
124
+ puts "#{key}, #{value}, #{row[key]}"
125
+ attribute_key = value
126
+ attribute_value = row[key]
127
+ next if attribute_value.nil?
128
+ attributes[attribute_key] = attribute_value
129
+ end
130
+
131
+ attributes
132
+ end
133
+
134
+ def terms
135
+ @terms ||= @body_rows.map do |row|
136
+ Term.new(parse_row(row).merge("language_code" => @language_code))
137
+ end
138
+ end
139
+
140
+ def to_hash
141
+ {
142
+ "terms" => terms.map(&:to_hash)
143
+ }
144
+ end
145
+
146
+ end
147
+
148
+ end
@@ -0,0 +1,5 @@
1
+ module Tc211
2
+ module Termbase
3
+ VERSION = "0.1.0"
4
+ end
5
+ end
@@ -0,0 +1,31 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "tc211/termbase/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "tc211-termbase"
8
+ spec.version = Tc211::Termbase::VERSION
9
+ spec.authors = ["Ribose"]
10
+ spec.email = ["open.source@ribose.com"]
11
+
12
+ spec.summary = %q{Build scripts for the ISO/TC 211 Termbase}
13
+ spec.description = %q{Build scripts for the ISO/TC 211 Termbase}
14
+ spec.homepage = "https://open.ribose.com"
15
+
16
+ # Specify which files should be added to the gem when it is released.
17
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
18
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
19
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
20
+ end
21
+ spec.bindir = "exe"
22
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
23
+ spec.require_paths = ["lib"]
24
+
25
+ spec.add_runtime_dependency "iso-639"
26
+ spec.add_runtime_dependency "creek"
27
+
28
+ spec.add_development_dependency "bundler", "~> 1.17"
29
+ spec.add_development_dependency "rake", "~> 10.0"
30
+ spec.add_development_dependency "rspec", "~> 3.0"
31
+ end
metadata ADDED
@@ -0,0 +1,137 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tc211-termbase
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ribose
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2018-11-23 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: iso-639
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: creek
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: bundler
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.17'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.17'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '10.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '10.0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: rspec
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '3.0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '3.0'
83
+ description: Build scripts for the ISO/TC 211 Termbase
84
+ email:
85
+ - open.source@ribose.com
86
+ executables:
87
+ - tc211-termbase-xlsx2yaml
88
+ extensions: []
89
+ extra_rdoc_files: []
90
+ files:
91
+ - ".gitignore"
92
+ - ".rspec"
93
+ - ".travis.yml"
94
+ - CODE_OF_CONDUCT.md
95
+ - Gemfile
96
+ - Gemfile.lock
97
+ - README.adoc
98
+ - Rakefile
99
+ - bin/console
100
+ - bin/setup
101
+ - exe/tc211-termbase-xlsx2yaml
102
+ - lib/tc211/termbase.rb
103
+ - lib/tc211/termbase/concept.rb
104
+ - lib/tc211/termbase/concept_collection.rb
105
+ - lib/tc211/termbase/information_sheet.rb
106
+ - lib/tc211/termbase/metadata_section.rb
107
+ - lib/tc211/termbase/sheet_section.rb
108
+ - lib/tc211/termbase/term.rb
109
+ - lib/tc211/termbase/term_workbook.rb
110
+ - lib/tc211/termbase/terminology_sheet.rb
111
+ - lib/tc211/termbase/terms_section.rb
112
+ - lib/tc211/termbase/version.rb
113
+ - tc211-termbase.gemspec
114
+ homepage: https://open.ribose.com
115
+ licenses: []
116
+ metadata: {}
117
+ post_install_message:
118
+ rdoc_options: []
119
+ require_paths:
120
+ - lib
121
+ required_ruby_version: !ruby/object:Gem::Requirement
122
+ requirements:
123
+ - - ">="
124
+ - !ruby/object:Gem::Version
125
+ version: '0'
126
+ required_rubygems_version: !ruby/object:Gem::Requirement
127
+ requirements:
128
+ - - ">="
129
+ - !ruby/object:Gem::Version
130
+ version: '0'
131
+ requirements: []
132
+ rubyforge_project:
133
+ rubygems_version: 2.7.7
134
+ signing_key:
135
+ specification_version: 4
136
+ summary: Build scripts for the ISO/TC 211 Termbase
137
+ test_files: []