glossarist-agent 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 5effe5b9399c088c41e1073dc5450dca94a5b52cee8d8f3d539450279a032207
4
+ data.tar.gz: d517d3428be089a55dc3d3788a8f8f6621f4bec32b0c70c1b3d28c6efee6281c
5
+ SHA512:
6
+ metadata.gz: 02e48745b3e92c17c1937de331d238e67ea0c0411d50d06c368796b5dbe32719fee05ee0ebe382708a860215331390a53391643b148cffb7eefad278b34c74b3
7
+ data.tar.gz: baf4ea5616a16ad50c09b957533348c2679ddb513cc7efb72d5f7d39c831419b6c03f0d1eec91df92741ea442b2975f6b6c209127f2847ef2497df09cd81fd31
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,8 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.0
3
+
4
+ Style/StringLiterals:
5
+ EnforcedStyle: double_quotes
6
+
7
+ Style/StringLiteralsInInterpolation:
8
+ EnforcedStyle: double_quotes
@@ -0,0 +1,132 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our
6
+ community a harassment-free experience for everyone, regardless of age, body
7
+ size, visible or invisible disability, ethnicity, sex characteristics, gender
8
+ identity and expression, level of experience, education, socio-economic status,
9
+ nationality, personal appearance, race, caste, color, religion, or sexual
10
+ identity and orientation.
11
+
12
+ We pledge to act and interact in ways that contribute to an open, welcoming,
13
+ diverse, inclusive, and healthy community.
14
+
15
+ ## Our Standards
16
+
17
+ Examples of behavior that contributes to a positive environment for our
18
+ community include:
19
+
20
+ * Demonstrating empathy and kindness toward other people
21
+ * Being respectful of differing opinions, viewpoints, and experiences
22
+ * Giving and gracefully accepting constructive feedback
23
+ * Accepting responsibility and apologizing to those affected by our mistakes,
24
+ and learning from the experience
25
+ * Focusing on what is best not just for us as individuals, but for the overall
26
+ community
27
+
28
+ Examples of unacceptable behavior include:
29
+
30
+ * The use of sexualized language or imagery, and sexual attention or advances of
31
+ any kind
32
+ * Trolling, insulting or derogatory comments, and personal or political attacks
33
+ * Public or private harassment
34
+ * Publishing others' private information, such as a physical or email address,
35
+ without their explicit permission
36
+ * Other conduct which could reasonably be considered inappropriate in a
37
+ professional setting
38
+
39
+ ## Enforcement Responsibilities
40
+
41
+ Community leaders are responsible for clarifying and enforcing our standards of
42
+ acceptable behavior and will take appropriate and fair corrective action in
43
+ response to any behavior that they deem inappropriate, threatening, offensive,
44
+ or harmful.
45
+
46
+ Community leaders have the right and responsibility to remove, edit, or reject
47
+ comments, commits, code, wiki edits, issues, and other contributions that are
48
+ not aligned to this Code of Conduct, and will communicate reasons for moderation
49
+ decisions when appropriate.
50
+
51
+ ## Scope
52
+
53
+ This Code of Conduct applies within all community spaces, and also applies when
54
+ an individual is officially representing the community in public spaces.
55
+ Examples of representing our community include using an official email address,
56
+ posting via an official social media account, or acting as an appointed
57
+ representative at an online or offline event.
58
+
59
+ ## Enforcement
60
+
61
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
62
+ reported to the community leaders responsible for enforcement at
63
+ [INSERT CONTACT METHOD].
64
+ All complaints will be reviewed and investigated promptly and fairly.
65
+
66
+ All community leaders are obligated to respect the privacy and security of the
67
+ reporter of any incident.
68
+
69
+ ## Enforcement Guidelines
70
+
71
+ Community leaders will follow these Community Impact Guidelines in determining
72
+ the consequences for any action they deem in violation of this Code of Conduct:
73
+
74
+ ### 1. Correction
75
+
76
+ **Community Impact**: Use of inappropriate language or other behavior deemed
77
+ unprofessional or unwelcome in the community.
78
+
79
+ **Consequence**: A private, written warning from community leaders, providing
80
+ clarity around the nature of the violation and an explanation of why the
81
+ behavior was inappropriate. A public apology may be requested.
82
+
83
+ ### 2. Warning
84
+
85
+ **Community Impact**: A violation through a single incident or series of
86
+ actions.
87
+
88
+ **Consequence**: A warning with consequences for continued behavior. No
89
+ interaction with the people involved, including unsolicited interaction with
90
+ those enforcing the Code of Conduct, for a specified period of time. This
91
+ includes avoiding interactions in community spaces as well as external channels
92
+ like social media. Violating these terms may lead to a temporary or permanent
93
+ ban.
94
+
95
+ ### 3. Temporary Ban
96
+
97
+ **Community Impact**: A serious violation of community standards, including
98
+ sustained inappropriate behavior.
99
+
100
+ **Consequence**: A temporary ban from any sort of interaction or public
101
+ communication with the community for a specified period of time. No public or
102
+ private interaction with the people involved, including unsolicited interaction
103
+ with those enforcing the Code of Conduct, is allowed during this period.
104
+ Violating these terms may lead to a permanent ban.
105
+
106
+ ### 4. Permanent Ban
107
+
108
+ **Community Impact**: Demonstrating a pattern of violation of community
109
+ standards, including sustained inappropriate behavior, harassment of an
110
+ individual, or aggression toward or disparagement of classes of individuals.
111
+
112
+ **Consequence**: A permanent ban from any sort of public interaction within the
113
+ community.
114
+
115
+ ## Attribution
116
+
117
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118
+ version 2.1, available at
119
+ [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
120
+
121
+ Community Impact Guidelines were inspired by
122
+ [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
123
+
124
+ For answers to common questions about this code of conduct, see the FAQ at
125
+ [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
126
+ [https://www.contributor-covenant.org/translations][translations].
127
+
128
+ [homepage]: https://www.contributor-covenant.org
129
+ [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
130
+ [Mozilla CoC]: https://github.com/mozilla/diversity
131
+ [FAQ]: https://www.contributor-covenant.org/faq
132
+ [translations]: https://www.contributor-covenant.org/translations
data/README.adoc ADDED
@@ -0,0 +1,149 @@
1
+ = Glossarist Agent
2
+
3
+ image:https://img.shields.io/gem/v/glossarist-agent.svg["Gem Version", link="https://rubygems.org/gems/glossarist-agent"]
4
+ image:https://github.com/relaton/glossarist-agent/workflows/rake/badge.svg["Build Status", link="https://github.com/relaton/glossarist-agent/actions?workflow=rake"]
5
+ image:https://codeclimate.com/github/relaton/glossarist-agent/badges/gpa.svg["Code Climate", link="https://codeclimate.com/github/relaton/glossarist-agent"]
6
+
7
+ == Purpose
8
+
9
+ The Glossarist Agent is a Ruby gem designed to retrieve remotely located concepts.
10
+
11
+ Currently, it allows the bulk retrieval of the IHO S-32 Hydrographic Dictionary
12
+ into the Glossarist format.
13
+
14
+
15
+ == Installation
16
+
17
+ Add this line to your application's `Gemfile`:
18
+
19
+ [source,ruby]
20
+ ----
21
+ gem 'glossarist-agent'
22
+ ----
23
+
24
+ And then execute:
25
+
26
+ [source,shell]
27
+ ----
28
+ $ bundle install
29
+ ----
30
+
31
+ Or install it yourself as:
32
+
33
+ [source,shell]
34
+ ----
35
+ $ gem install glossarist-agent
36
+ ----
37
+
38
+
39
+ == Usage
40
+
41
+ === Downloading IHO S-32 Hydrographic Dictionary data
42
+
43
+ ==== General
44
+
45
+ The Glossarist Agent can download and process IHO (International Hydrographic
46
+ Organization) S-32 Hydrographic Dictionary data from available CSV files.
47
+
48
+ The official site is located at:
49
+
50
+ * http://iho-ohi.net/S32/
51
+
52
+ Glossarist Agent uses a caching mechanism to efficiently manage downloads and
53
+ reduce unnecessary network requests.
54
+
55
+ To retrieve these concepts and generate a Glossarist dataset, use the following
56
+ command:
57
+
58
+ [source,shell]
59
+ ----
60
+ $ glossarist-agent iho retrieve-concepts
61
+ ----
62
+
63
+ This command performs the following actions:
64
+
65
+ . Downloads the required CSV files from IHO sources.
66
+ . Caches the downloaded files for future use.
67
+ . Processes the CSV data to generate a Glossarist-compatible dataset.
68
+
69
+ ==== Command Options
70
+
71
+ [source,shell]
72
+ ----
73
+ $ glossarist-agent iho help retrieve-concepts
74
+ Usage:
75
+ glossarist-agent iho retrieve-concepts
76
+
77
+ Options:
78
+ -o, [--output=OUTPUT] # Directory to output generated files
79
+ # Default: ./output
80
+ -c, [--cache=CACHE] # Directory to store cached files
81
+ # Default: ~/.glossarist-agent/cache
82
+ [--fetch], [--no-fetch], [--skip-fetch] # Fetch new data (default: true)
83
+ # Default: true
84
+
85
+ Download IHO CSV files and generate concepts
86
+ ----
87
+
88
+ `--output`:: Specifies the directory where the generated Glossarist dataset will be saved. Default is `./output`.
89
+ `--cache`:: Sets the directory for storing cached files. Default is `~/.glossarist-agent/cache`.
90
+ `--fetch`:: Controls whether to fetch new data or use existing cached data. Default is `true`.
91
+
92
+ [example]
93
+ ====
94
+ The following command saves the IHO S-32 Glossarist dataset at
95
+ `./iho-s32-glossarist` and prioritizes using the existing cache without
96
+ communicating with the server.
97
+
98
+ [source,sh]
99
+ ----
100
+ $ glossarist-agent iho retrieve-concepts --no-fetch -o iho-s32-glossarist
101
+ ----
102
+ ====
103
+
104
+
105
+ === Caching mechanism
106
+
107
+ The Glossarist Agent employs a sophisticated caching system to optimize
108
+ performance and reduce unnecessary downloads:
109
+
110
+ . Downloaded files are stored in the specified cache directory.
111
+ . Each cached file is associated with metadata, including the download time and ETag.
112
+ . When fetching data, the agent checks:
113
+ .. If the cached file exists and is within the expiry period (default 7 days).
114
+ .. If the server's ETag matches the cached ETag.
115
+ . If either condition is not met, the agent downloads a fresh copy of the file.
116
+
117
+ This approach ensures that the agent always works with up-to-date data while minimizing network usage.
118
+
119
+ === Generating Glossarist Dataset
120
+
121
+ After downloading and caching the IHO CSV files, the agent processes the data to generate a Glossarist-compatible dataset:
122
+
123
+ . It parses the CSV files to extract concept information.
124
+ . The extracted data is transformed into the Glossarist data model.
125
+ . The resulting dataset is saved in the specified output directory.
126
+
127
+ This generated dataset can then be used with other Glossarist tools for further processing or integration into concept management systems.
128
+
129
+ == Features
130
+
131
+ * Automated downloading and caching of IHO CSV files
132
+ * ETag-based cache validation
133
+ * Customizable cache expiry period
134
+ * Generation of Glossarist-compatible datasets from IHO data
135
+ * Command-line interface for easy integration into workflows
136
+
137
+ == Development
138
+
139
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
140
+
141
+ To install this gem onto your local machine, run `bundle exec rake install`.
142
+
143
+
144
+ == License
145
+
146
+ Copyright Ribose.
147
+
148
+ The gem is available as open source under the terms of the
149
+ https://opensource.org/licenses/MIT[MIT License].
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require_relative "../lib/glossarist/agent/cli"
5
+
6
+ Glossarist::Agent::Cli.start(ARGV)
@@ -0,0 +1,11 @@
1
+ require "thor"
2
+ require_relative "iho/cli"
3
+
4
+ module Glossarist
5
+ module Agent
6
+ class Cli < Thor
7
+ desc "iho SUBCOMMAND ...ARGS", "IHO-related commands"
8
+ subcommand "iho", Iho::Cli
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Glossarist
4
+ module Agent
5
+ class Collection
6
+ attr_reader :collection
7
+
8
+ def initialize
9
+ @collection = Glossarist::ManagedConceptCollection.new
10
+ end
11
+
12
+ def add_concept(data)
13
+ concept = Concept.new(data)
14
+ @collection << concept.managed_concept
15
+ end
16
+
17
+ def save_to_files(output_path)
18
+ @collection.save_to_files(output_path)
19
+ end
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Glossarist
4
+ module Agent
5
+ class Concept
6
+ attr_reader :managed_concept
7
+
8
+ def initialize(data)
9
+ @managed_concept = Glossarist::ManagedConcept.new
10
+ populate_concept(data)
11
+ end
12
+
13
+ private
14
+
15
+ def populate_concept(data)
16
+ @managed_concept.id = data[:id]
17
+ @managed_concept.groups = data[:groups] || []
18
+
19
+ localized_concept = Glossarist::LocalizedConcept.new
20
+ localized_concept.language_code = "eng"
21
+ localized_concept.definition = [Glossarist::DetailedDefinition.new(content: data[:definition])]
22
+ localized_concept.designations = [Glossarist::Designation::Base.from_h({ "type" => "expression", "designation" => data[:term] })]
23
+
24
+ @managed_concept.localizations = { "eng" => localized_concept }
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,141 @@
1
+ require "faraday"
2
+ require "fileutils"
3
+ require "json"
4
+ require "time"
5
+
6
+ module Glossarist
7
+ module Agent
8
+ class HttpCacheDownloader
9
+ CACHE_EXPIRY_DAYS = 7
10
+
11
+ def initialize(cache_dir, fetch: true)
12
+ @cache_dir = cache_dir
13
+ @fetch = fetch
14
+ FileUtils.mkdir_p(@cache_dir)
15
+ @client = Faraday.new { |faraday| faraday.adapter Faraday.default_adapter }
16
+ end
17
+
18
+ def download_files(url_map)
19
+ url_map.each { |filename, url| process_file(filename, url) }
20
+ puts "All files are up to date in the cache."
21
+ end
22
+
23
+ private
24
+
25
+ def process_file(filename, url)
26
+ cache_path = File.join(@cache_dir, filename)
27
+ metadata_path = "#{cache_path}.metadata.json"
28
+
29
+ if cache_exists?(cache_path, metadata_path)
30
+ handle_existing_cache(filename, url, cache_path, metadata_path)
31
+ else
32
+ handle_missing_cache(filename, url, cache_path, metadata_path)
33
+ end
34
+ end
35
+
36
+ def cache_exists?(cache_path, metadata_path)
37
+ File.exist?(cache_path) && File.exist?(metadata_path)
38
+ end
39
+
40
+ def handle_existing_cache(filename, url, cache_path, metadata_path)
41
+ if !@fetch
42
+ puts "Using cached #{filename} (fetch disabled)"
43
+ elsif should_download?(url, cache_path, metadata_path)
44
+ download_file(url, cache_path, metadata_path)
45
+ else
46
+ puts "Using cached #{filename} (not modified or within #{CACHE_EXPIRY_DAYS}-day period)"
47
+ end
48
+ end
49
+
50
+ def handle_missing_cache(filename, url, cache_path, metadata_path)
51
+ if !@fetch
52
+ puts "Cache file or metadata missing for #{filename}. Skipping as fetch is disabled."
53
+ else
54
+ puts "Cache file or metadata missing for #{filename}. Downloading..."
55
+ download_file(url, cache_path, metadata_path)
56
+ end
57
+ end
58
+
59
+ def should_download?(url, cache_path, metadata_path)
60
+ return false unless @fetch
61
+
62
+ metadata = read_metadata(metadata_path)
63
+ headers = fetch_headers(url)
64
+ server_etag = headers["ETag"]
65
+
66
+ if server_etag
67
+ handle_etag(metadata, server_etag)
68
+ else
69
+ handle_no_etag(metadata)
70
+ end
71
+ end
72
+
73
+ def handle_etag(metadata, server_etag)
74
+ if metadata["etag"] && server_etag != metadata["etag"]
75
+ puts "ETag mismatch. Stored: #{metadata["etag"]}, Server: #{server_etag}"
76
+ true
77
+ else
78
+ false
79
+ end
80
+ end
81
+
82
+ def handle_no_etag(metadata)
83
+ puts "Server did not provide an ETag. Checking file age..."
84
+ file_older_than_days?(metadata["download_time"], CACHE_EXPIRY_DAYS)
85
+ end
86
+
87
+ def file_older_than_days?(download_time, days)
88
+ return true unless download_time
89
+
90
+ file_age = (Time.now - Time.parse(download_time)) / (24 * 60 * 60)
91
+ if file_age > days
92
+ puts "Cache file is older than #{days} days. Will download."
93
+ true
94
+ else
95
+ puts "Cache file is within #{days} days old. Using cached version."
96
+ false
97
+ end
98
+ end
99
+
100
+ def fetch_headers(url)
101
+ @client.head(url).headers
102
+ rescue Faraday::Error => e
103
+ puts "Error fetching headers for #{url}: #{e.message}"
104
+ {}
105
+ end
106
+
107
+ def download_file(url, cache_path, metadata_path)
108
+ puts "Downloading #{File.basename(cache_path)}..."
109
+ response = @client.get(url)
110
+
111
+ if response.success?
112
+ write_file_and_metadata(response, url, cache_path, metadata_path)
113
+ else
114
+ puts "Error downloading #{url}: HTTP #{response.status}"
115
+ end
116
+ rescue Faraday::Error => e
117
+ puts "Error downloading #{url}: #{e.message}"
118
+ end
119
+
120
+ def write_file_and_metadata(response, url, cache_path, metadata_path)
121
+ File.write(cache_path, response.body)
122
+
123
+ metadata = {
124
+ "url" => url,
125
+ "download_time" => Time.now.iso8601,
126
+ "etag" => response.headers["ETag"],
127
+ }
128
+
129
+ File.write(metadata_path, JSON.pretty_generate(metadata))
130
+ puts "Updated metadata for #{File.basename(cache_path)}"
131
+ end
132
+
133
+ def read_metadata(metadata_path)
134
+ File.exist?(metadata_path) ? JSON.parse(File.read(metadata_path)) : {}
135
+ rescue JSON::ParserError
136
+ puts "Error parsing metadata file. Treating as empty."
137
+ {}
138
+ end
139
+ end
140
+ end
141
+ end
@@ -0,0 +1,60 @@
1
+ require "shale"
2
+ require "csv"
3
+
4
+ require "shale/adapter/csv"
5
+ unless Shale.csv_adapter
6
+ Shale.csv_adapter = Shale::Adapter::CSV
7
+ end
8
+
9
+ module Glossarist
10
+ module Agent
11
+ module Iho
12
+ class BilingualRow < Shale::Mapper
13
+ attribute :eng_id, Shale::Type::String
14
+ attribute :other_id, Shale::Type::String
15
+ attribute :eng_term, Shale::Type::String
16
+ attribute :other_term, Shale::Type::String
17
+ attribute :eng_definition, Shale::Type::String
18
+ attribute :other_definition, Shale::Type::String
19
+ end
20
+
21
+ class SimpleConcept < Shale::Mapper
22
+ attribute :id, Shale::Type::String
23
+ attribute :lang_code, Shale::Type::String
24
+ attribute :term, Shale::Type::String
25
+ attribute :definition, Shale::Type::String
26
+ end
27
+
28
+ class BilingualTable
29
+ attr_accessor :file_path, :rows, :lang_code, :concepts_eng, :concepts_other
30
+
31
+ def initialize(file_path:, lang_code:)
32
+ @file_path = file_path
33
+ @lang_code = lang_code
34
+ end
35
+
36
+ def process
37
+ # @rows = []
38
+ @rows = BilingualRow.from_csv(IO.read(@file_path))[1..-1]
39
+ @concepts_eng = @rows.map do |bilingual_row|
40
+ SimpleConcept.new(
41
+ lang_code: :eng,
42
+ id: bilingual_row.eng_id.strip,
43
+ term: bilingual_row.eng_term.strip,
44
+ definition: bilingual_row.eng_definition.strip,
45
+ )
46
+ end
47
+
48
+ @concepts_other = @rows.map do |bilingual_row|
49
+ SimpleConcept.new(
50
+ lang_code: lang_code,
51
+ id: bilingual_row.other_id.strip,
52
+ term: bilingual_row.other_term.strip,
53
+ definition: bilingual_row.other_definition.strip,
54
+ )
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,28 @@
1
+ require "thor"
2
+ require_relative "downloader"
3
+ require_relative "generator"
4
+
5
+ module Glossarist
6
+ module Agent
7
+ module Iho
8
+ class Cli < Thor
9
+ desc "retrieve-concepts", "Download IHO CSV files and generate concepts"
10
+ option :output, type: :string, default: "./output", aliases: "-o", desc: "Directory to output generated files"
11
+ option :cache, type: :string, default: "~/.glossarist-agent/cache", aliases: "-c", desc: "Directory to store cached files"
12
+ option :fetch, type: :boolean, default: true, desc: "Fetch new data (default: true)"
13
+
14
+ def retrieve_concepts
15
+ cache_dir = File.expand_path(options[:cache])
16
+ output_dir = File.expand_path(options[:output])
17
+ fetch = options[:fetch]
18
+
19
+ downloader = Downloader.new(cache_dir, fetch: fetch)
20
+ downloader.download_csv_files
21
+
22
+ generator = Generator.new(cache_dir, output_dir)
23
+ generator.save_to_files
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,44 @@
1
+ require_relative "../http_cache_downloader"
2
+
3
+ module Glossarist
4
+ module Agent
5
+ module Iho
6
+ class Downloader
7
+ LANG_MAPPING = {
8
+ fra: {
9
+ "engFreView.csv" => "http://iho-ohi.net/S32/engFreView.php?operation=ecsv",
10
+ },
11
+ spa: {
12
+ "engEspView.csv" => "http://iho-ohi.net/S32/engEspView.php?operation=ecsv",
13
+ },
14
+ zho: {
15
+ "engChnView.csv" => "http://iho-ohi.net/S32/engChnView.php?operation=ecsv",
16
+ },
17
+ ind: {
18
+ "engIndView.csv" => "http://iho-ohi.net/S32/engIndView.php?operation=ecsv",
19
+ },
20
+ }.freeze
21
+
22
+ CSV_URLS = LANG_MAPPING.values.inject({}) do |acc, x|
23
+ acc.merge!(x)
24
+ acc
25
+ end
26
+
27
+ def initialize(cache_dir, fetch: true)
28
+ @cache_downloader = HttpCacheDownloader.new(cache_dir, fetch: fetch)
29
+ end
30
+
31
+ def download_csv_files
32
+ @cache_downloader.download_files(CSV_URLS)
33
+ end
34
+
35
+ def self.lang_code_by_filename(filename)
36
+ LANG_MAPPING.each do |lang_code, files|
37
+ return lang_code if files.key?(filename)
38
+ end
39
+ nil
40
+ end
41
+ end
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,116 @@
1
+ require "glossarist"
2
+ require_relative "bilingual_table"
3
+ require_relative "downloader"
4
+
5
+ module Glossarist
6
+ module Agent
7
+ module Iho
8
+ class Generator
9
+ attr_accessor :language_tables, :simple_concepts
10
+
11
+ def initialize(cache_dir, output_path)
12
+ @cache_dir = cache_dir
13
+ @output_path = output_path
14
+ FileUtils.mkdir_p(@output_path)
15
+ end
16
+
17
+ def collection
18
+ return @collection if @collection
19
+
20
+ parse_language_tables
21
+ build_simple_concepts
22
+ convert_to_glossarist
23
+
24
+ @collection
25
+ end
26
+
27
+ def parse_language_tables
28
+ return @language_tables if @language_tables
29
+
30
+ @language_tables = {}
31
+ Dir.glob(File.join(@cache_dir, "*.csv")).map do |file|
32
+ lang_code = Downloader.lang_code_by_filename(File.basename(file))
33
+
34
+ table = BilingualTable.new(file_path: file, lang_code: lang_code).tap do |table|
35
+ table.process
36
+ end
37
+
38
+ @language_tables[lang_code] = table
39
+ end
40
+
41
+ @language_tables
42
+ end
43
+
44
+ def build_simple_concepts
45
+ return @simple_concepts if @simple_concepts
46
+
47
+ @simple_concepts = {}
48
+ build_english_concepts
49
+ build_other_language_concepts
50
+ @simple_concepts
51
+ end
52
+
53
+ def convert_to_glossarist
54
+ @collection = ::Glossarist::ManagedConceptCollection.new
55
+ @collection.managed_concepts = create_managed_concepts
56
+ end
57
+
58
+ def save_to_files
59
+ collection.save_to_files(@output_path)
60
+ puts "Concepts generated and saved to #{@output_path}"
61
+ end
62
+
63
+ private
64
+
65
+ private
66
+
67
+ def build_english_concepts
68
+ @language_tables[:fra].concepts_eng.each do |concept|
69
+ @simple_concepts[concept.id] = { eng: concept_data(concept) }
70
+ end
71
+ end
72
+
73
+ def build_other_language_concepts
74
+ @language_tables.each do |lang_code, table|
75
+ table.concepts_other.each do |concept|
76
+ @simple_concepts[concept.id][lang_code] = concept_data(concept)
77
+ end
78
+ end
79
+ end
80
+
81
+ def concept_data(concept)
82
+ {
83
+ term: concept.term,
84
+ definition: concept.definition,
85
+ }
86
+ end
87
+
88
+ def create_managed_concepts
89
+ @simple_concepts.map do |id, localized_concepts|
90
+ create_managed_concept(id, localized_concepts)
91
+ end
92
+ end
93
+
94
+ def create_managed_concept(id, localized_concepts)
95
+ Glossarist::ManagedConcept.new(id: id).tap do |con|
96
+ localized_concepts.each do |lang_code, data|
97
+ con.add_localization(create_localized_concept(lang_code, data))
98
+ end
99
+ end
100
+ end
101
+
102
+ def create_localized_concept(lang_code, data)
103
+ Glossarist::LocalizedConcept.new(
104
+ "language_code" => lang_code.to_s,
105
+ "terms" => [{
106
+ "designation" => data[:term],
107
+ "type" => "expression",
108
+ "normative_status" => "preferred",
109
+ }],
110
+ "definition" => [{ "content" => data[:definition] }],
111
+ )
112
+ end
113
+ end
114
+ end
115
+ end
116
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Glossarist
4
+ module Agent
5
+ VERSION = "0.1.0"
6
+ end
7
+ end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "agent/version"
4
+ require_relative "agent/concept"
5
+ require_relative "agent/collection"
6
+
7
+ module Glossarist
8
+ module Agent
9
+ class Error < StandardError; end
10
+
11
+ # Your code goes here...
12
+ end
13
+ end
@@ -0,0 +1 @@
1
+ require_relative "glossarist/agent"
@@ -0,0 +1,6 @@
1
+ module Glossarist
2
+ module Agent
3
+ VERSION: String
4
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
5
+ end
6
+ end
metadata ADDED
@@ -0,0 +1,191 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: glossarist-agent
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ribose Inc.
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-07-14 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: shale
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: thor
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: csv
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: glossarist
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: faraday
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rake
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rspec
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: rubocop
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: rubocop-performance
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ description: Glossarist component to retrieve content from remote sources
140
+ email:
141
+ - open.source@ribose.com
142
+ executables:
143
+ - glossarist-agent
144
+ extensions: []
145
+ extra_rdoc_files: []
146
+ files:
147
+ - ".rspec"
148
+ - ".rubocop.yml"
149
+ - CODE_OF_CONDUCT.md
150
+ - README.adoc
151
+ - Rakefile
152
+ - exe/glossarist-agent
153
+ - lib/glossarist-agent.rb
154
+ - lib/glossarist/agent.rb
155
+ - lib/glossarist/agent/cli.rb
156
+ - lib/glossarist/agent/collection.rb
157
+ - lib/glossarist/agent/concept.rb
158
+ - lib/glossarist/agent/http_cache_downloader.rb
159
+ - lib/glossarist/agent/iho/bilingual_table.rb
160
+ - lib/glossarist/agent/iho/cli.rb
161
+ - lib/glossarist/agent/iho/downloader.rb
162
+ - lib/glossarist/agent/iho/generator.rb
163
+ - lib/glossarist/agent/version.rb
164
+ - sig/glossarist/agent.rbs
165
+ homepage: https://github.com/glossarist/glossarist-agent
166
+ licenses:
167
+ - BSD-2-Clause
168
+ metadata:
169
+ homepage_uri: https://github.com/glossarist/glossarist-agent
170
+ source_code_uri: https://github.com/glossarist/glossarist-agent
171
+ changelog_uri: https://github.com/glossarist/glossarist-agent
172
+ post_install_message:
173
+ rdoc_options: []
174
+ require_paths:
175
+ - lib
176
+ required_ruby_version: !ruby/object:Gem::Requirement
177
+ requirements:
178
+ - - ">="
179
+ - !ruby/object:Gem::Version
180
+ version: 2.7.0
181
+ required_rubygems_version: !ruby/object:Gem::Requirement
182
+ requirements:
183
+ - - ">="
184
+ - !ruby/object:Gem::Version
185
+ version: '0'
186
+ requirements: []
187
+ rubygems_version: 3.5.11
188
+ signing_key:
189
+ specification_version: 4
190
+ summary: Glossarist component to retrieve content from remote sources
191
+ test_files: []