glossarist-agent 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 5effe5b9399c088c41e1073dc5450dca94a5b52cee8d8f3d539450279a032207
4
+ data.tar.gz: d517d3428be089a55dc3d3788a8f8f6621f4bec32b0c70c1b3d28c6efee6281c
5
+ SHA512:
6
+ metadata.gz: 02e48745b3e92c17c1937de331d238e67ea0c0411d50d06c368796b5dbe32719fee05ee0ebe382708a860215331390a53391643b148cffb7eefad278b34c74b3
7
+ data.tar.gz: baf4ea5616a16ad50c09b957533348c2679ddb513cc7efb72d5f7d39c831419b6c03f0d1eec91df92741ea442b2975f6b6c209127f2847ef2497df09cd81fd31
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,8 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.0
3
+
4
+ Style/StringLiterals:
5
+ EnforcedStyle: double_quotes
6
+
7
+ Style/StringLiteralsInInterpolation:
8
+ EnforcedStyle: double_quotes
@@ -0,0 +1,132 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our
6
+ community a harassment-free experience for everyone, regardless of age, body
7
+ size, visible or invisible disability, ethnicity, sex characteristics, gender
8
+ identity and expression, level of experience, education, socio-economic status,
9
+ nationality, personal appearance, race, caste, color, religion, or sexual
10
+ identity and orientation.
11
+
12
+ We pledge to act and interact in ways that contribute to an open, welcoming,
13
+ diverse, inclusive, and healthy community.
14
+
15
+ ## Our Standards
16
+
17
+ Examples of behavior that contributes to a positive environment for our
18
+ community include:
19
+
20
+ * Demonstrating empathy and kindness toward other people
21
+ * Being respectful of differing opinions, viewpoints, and experiences
22
+ * Giving and gracefully accepting constructive feedback
23
+ * Accepting responsibility and apologizing to those affected by our mistakes,
24
+ and learning from the experience
25
+ * Focusing on what is best not just for us as individuals, but for the overall
26
+ community
27
+
28
+ Examples of unacceptable behavior include:
29
+
30
+ * The use of sexualized language or imagery, and sexual attention or advances of
31
+ any kind
32
+ * Trolling, insulting or derogatory comments, and personal or political attacks
33
+ * Public or private harassment
34
+ * Publishing others' private information, such as a physical or email address,
35
+ without their explicit permission
36
+ * Other conduct which could reasonably be considered inappropriate in a
37
+ professional setting
38
+
39
+ ## Enforcement Responsibilities
40
+
41
+ Community leaders are responsible for clarifying and enforcing our standards of
42
+ acceptable behavior and will take appropriate and fair corrective action in
43
+ response to any behavior that they deem inappropriate, threatening, offensive,
44
+ or harmful.
45
+
46
+ Community leaders have the right and responsibility to remove, edit, or reject
47
+ comments, commits, code, wiki edits, issues, and other contributions that are
48
+ not aligned to this Code of Conduct, and will communicate reasons for moderation
49
+ decisions when appropriate.
50
+
51
+ ## Scope
52
+
53
+ This Code of Conduct applies within all community spaces, and also applies when
54
+ an individual is officially representing the community in public spaces.
55
+ Examples of representing our community include using an official email address,
56
+ posting via an official social media account, or acting as an appointed
57
+ representative at an online or offline event.
58
+
59
+ ## Enforcement
60
+
61
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
62
+ reported to the community leaders responsible for enforcement at
63
+ [INSERT CONTACT METHOD].
64
+ All complaints will be reviewed and investigated promptly and fairly.
65
+
66
+ All community leaders are obligated to respect the privacy and security of the
67
+ reporter of any incident.
68
+
69
+ ## Enforcement Guidelines
70
+
71
+ Community leaders will follow these Community Impact Guidelines in determining
72
+ the consequences for any action they deem in violation of this Code of Conduct:
73
+
74
+ ### 1. Correction
75
+
76
+ **Community Impact**: Use of inappropriate language or other behavior deemed
77
+ unprofessional or unwelcome in the community.
78
+
79
+ **Consequence**: A private, written warning from community leaders, providing
80
+ clarity around the nature of the violation and an explanation of why the
81
+ behavior was inappropriate. A public apology may be requested.
82
+
83
+ ### 2. Warning
84
+
85
+ **Community Impact**: A violation through a single incident or series of
86
+ actions.
87
+
88
+ **Consequence**: A warning with consequences for continued behavior. No
89
+ interaction with the people involved, including unsolicited interaction with
90
+ those enforcing the Code of Conduct, for a specified period of time. This
91
+ includes avoiding interactions in community spaces as well as external channels
92
+ like social media. Violating these terms may lead to a temporary or permanent
93
+ ban.
94
+
95
+ ### 3. Temporary Ban
96
+
97
+ **Community Impact**: A serious violation of community standards, including
98
+ sustained inappropriate behavior.
99
+
100
+ **Consequence**: A temporary ban from any sort of interaction or public
101
+ communication with the community for a specified period of time. No public or
102
+ private interaction with the people involved, including unsolicited interaction
103
+ with those enforcing the Code of Conduct, is allowed during this period.
104
+ Violating these terms may lead to a permanent ban.
105
+
106
+ ### 4. Permanent Ban
107
+
108
+ **Community Impact**: Demonstrating a pattern of violation of community
109
+ standards, including sustained inappropriate behavior, harassment of an
110
+ individual, or aggression toward or disparagement of classes of individuals.
111
+
112
+ **Consequence**: A permanent ban from any sort of public interaction within the
113
+ community.
114
+
115
+ ## Attribution
116
+
117
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118
+ version 2.1, available at
119
+ [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
120
+
121
+ Community Impact Guidelines were inspired by
122
+ [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
123
+
124
+ For answers to common questions about this code of conduct, see the FAQ at
125
+ [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
126
+ [https://www.contributor-covenant.org/translations][translations].
127
+
128
+ [homepage]: https://www.contributor-covenant.org
129
+ [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
130
+ [Mozilla CoC]: https://github.com/mozilla/diversity
131
+ [FAQ]: https://www.contributor-covenant.org/faq
132
+ [translations]: https://www.contributor-covenant.org/translations
data/README.adoc ADDED
@@ -0,0 +1,149 @@
1
+ = Glossarist Agent
2
+
3
+ image:https://img.shields.io/gem/v/glossarist-agent.svg["Gem Version", link="https://rubygems.org/gems/glossarist-agent"]
4
+ image:https://github.com/relaton/glossarist-agent/workflows/rake/badge.svg["Build Status", link="https://github.com/relaton/glossarist-agent/actions?workflow=rake"]
5
+ image:https://codeclimate.com/github/relaton/glossarist-agent/badges/gpa.svg["Code Climate", link="https://codeclimate.com/github/relaton/glossarist-agent"]
6
+
7
+ == Purpose
8
+
9
+ The Glossarist Agent is a Ruby gem designed to retrieve remotely located concepts.
10
+
11
+ Currently, it allows the bulk retrieval of the IHO S-32 Hydrographic Dictionary
12
+ into the Glossarist format.
13
+
14
+
15
+ == Installation
16
+
17
+ Add this line to your application's `Gemfile`:
18
+
19
+ [source,ruby]
20
+ ----
21
+ gem 'glossarist-agent'
22
+ ----
23
+
24
+ And then execute:
25
+
26
+ [source,shell]
27
+ ----
28
+ $ bundle install
29
+ ----
30
+
31
+ Or install it yourself as:
32
+
33
+ [source,shell]
34
+ ----
35
+ $ gem install glossarist-agent
36
+ ----
37
+
38
+
39
+ == Usage
40
+
41
+ === Downloading IHO S-32 Hydrographic Dictionary data
42
+
43
+ ==== General
44
+
45
+ The Glossarist Agent can download and process IHO (International Hydrographic
46
+ Organization) S-32 Hydrographic Dictionary data from available CSV files.
47
+
48
+ The official site is located at:
49
+
50
+ * http://iho-ohi.net/S32/
51
+
52
+ Glossarist Agent uses a caching mechanism to efficiently manage downloads and
53
+ reduce unnecessary network requests.
54
+
55
+ To retrieve these concepts and generate a Glossarist dataset, use the following
56
+ command:
57
+
58
+ [source,shell]
59
+ ----
60
+ $ glossarist-agent iho retrieve-concepts
61
+ ----
62
+
63
+ This command performs the following actions:
64
+
65
+ . Downloads the required CSV files from IHO sources.
66
+ . Caches the downloaded files for future use.
67
+ . Processes the CSV data to generate a Glossarist-compatible dataset.
68
+
69
+ ==== Command Options
70
+
71
+ [source,shell]
72
+ ----
73
+ $ glossarist-agent iho help retrieve-concepts
74
+ Usage:
75
+ glossarist-agent iho retrieve-concepts
76
+
77
+ Options:
78
+ -o, [--output=OUTPUT] # Directory to output generated files
79
+ # Default: ./output
80
+ -c, [--cache=CACHE] # Directory to store cached files
81
+ # Default: ~/.glossarist-agent/cache
82
+ [--fetch], [--no-fetch], [--skip-fetch] # Fetch new data (default: true)
83
+ # Default: true
84
+
85
+ Download IHO CSV files and generate concepts
86
+ ----
87
+
88
+ `--output`:: Specifies the directory where the generated Glossarist dataset will be saved. Default is `./output`.
89
+ `--cache`:: Sets the directory for storing cached files. Default is `~/.glossarist-agent/cache`.
90
+ `--fetch`:: Controls whether to fetch new data or use existing cached data. Default is `true`.
91
+
92
+ [example]
93
+ ====
94
+ The following command saves the IHO S-32 Glossarist dataset at
95
+ `./iho-s32-glossarist` and prioritizes using the existing cache without
96
+ communicating with the server.
97
+
98
+ [source,sh]
99
+ ----
100
+ $ glossarist-agent iho retrieve-concepts --no-fetch -o iho-s32-glossarist
101
+ ----
102
+ ====
103
+
104
+
105
+ === Caching mechanism
106
+
107
+ The Glossarist Agent employs a sophisticated caching system to optimize
108
+ performance and reduce unnecessary downloads:
109
+
110
+ . Downloaded files are stored in the specified cache directory.
111
+ . Each cached file is associated with metadata, including the download time and ETag.
112
+ . When fetching data, the agent checks:
113
+ .. If the cached file exists and is within the expiry period (default 7 days).
114
+ .. If the server's ETag matches the cached ETag.
115
+ . If either condition is not met, the agent downloads a fresh copy of the file.
116
+
117
+ This approach ensures that the agent always works with up-to-date data while minimizing network usage.
118
+
119
+ === Generating Glossarist Dataset
120
+
121
+ After downloading and caching the IHO CSV files, the agent processes the data to generate a Glossarist-compatible dataset:
122
+
123
+ . It parses the CSV files to extract concept information.
124
+ . The extracted data is transformed into the Glossarist data model.
125
+ . The resulting dataset is saved in the specified output directory.
126
+
127
+ This generated dataset can then be used with other Glossarist tools for further processing or integration into concept management systems.
128
+
129
+ == Features
130
+
131
+ * Automated downloading and caching of IHO CSV files
132
+ * ETag-based cache validation
133
+ * Customizable cache expiry period
134
+ * Generation of Glossarist-compatible datasets from IHO data
135
+ * Command-line interface for easy integration into workflows
136
+
137
+ == Development
138
+
139
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
140
+
141
+ To install this gem onto your local machine, run `bundle exec rake install`.
142
+
143
+
144
+ == License
145
+
146
+ Copyright Ribose.
147
+
148
+ The gem is available as open source under the terms of the
149
+ https://opensource.org/licenses/MIT[MIT License].
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+
6
+ RSpec::Core::RakeTask.new(:spec)
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[spec rubocop]
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require_relative "../lib/glossarist/agent/cli"
5
+
6
+ Glossarist::Agent::Cli.start(ARGV)
@@ -0,0 +1,11 @@
1
+ require "thor"
2
+ require_relative "iho/cli"
3
+
4
+ module Glossarist
5
+ module Agent
6
+ class Cli < Thor
7
+ desc "iho SUBCOMMAND ...ARGS", "IHO-related commands"
8
+ subcommand "iho", Iho::Cli
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Glossarist
4
+ module Agent
5
+ class Collection
6
+ attr_reader :collection
7
+
8
+ def initialize
9
+ @collection = Glossarist::ManagedConceptCollection.new
10
+ end
11
+
12
+ def add_concept(data)
13
+ concept = Concept.new(data)
14
+ @collection << concept.managed_concept
15
+ end
16
+
17
+ def save_to_files(output_path)
18
+ @collection.save_to_files(output_path)
19
+ end
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Glossarist
4
+ module Agent
5
+ class Concept
6
+ attr_reader :managed_concept
7
+
8
+ def initialize(data)
9
+ @managed_concept = Glossarist::ManagedConcept.new
10
+ populate_concept(data)
11
+ end
12
+
13
+ private
14
+
15
+ def populate_concept(data)
16
+ @managed_concept.id = data[:id]
17
+ @managed_concept.groups = data[:groups] || []
18
+
19
+ localized_concept = Glossarist::LocalizedConcept.new
20
+ localized_concept.language_code = "eng"
21
+ localized_concept.definition = [Glossarist::DetailedDefinition.new(content: data[:definition])]
22
+ localized_concept.designations = [Glossarist::Designation::Base.from_h({ "type" => "expression", "designation" => data[:term] })]
23
+
24
+ @managed_concept.localizations = { "eng" => localized_concept }
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,141 @@
1
+ require "faraday"
2
+ require "fileutils"
3
+ require "json"
4
+ require "time"
5
+
6
+ module Glossarist
7
+ module Agent
8
+ class HttpCacheDownloader
9
+ CACHE_EXPIRY_DAYS = 7
10
+
11
+ def initialize(cache_dir, fetch: true)
12
+ @cache_dir = cache_dir
13
+ @fetch = fetch
14
+ FileUtils.mkdir_p(@cache_dir)
15
+ @client = Faraday.new { |faraday| faraday.adapter Faraday.default_adapter }
16
+ end
17
+
18
+ def download_files(url_map)
19
+ url_map.each { |filename, url| process_file(filename, url) }
20
+ puts "All files are up to date in the cache."
21
+ end
22
+
23
+ private
24
+
25
+ def process_file(filename, url)
26
+ cache_path = File.join(@cache_dir, filename)
27
+ metadata_path = "#{cache_path}.metadata.json"
28
+
29
+ if cache_exists?(cache_path, metadata_path)
30
+ handle_existing_cache(filename, url, cache_path, metadata_path)
31
+ else
32
+ handle_missing_cache(filename, url, cache_path, metadata_path)
33
+ end
34
+ end
35
+
36
+ def cache_exists?(cache_path, metadata_path)
37
+ File.exist?(cache_path) && File.exist?(metadata_path)
38
+ end
39
+
40
+ def handle_existing_cache(filename, url, cache_path, metadata_path)
41
+ if !@fetch
42
+ puts "Using cached #{filename} (fetch disabled)"
43
+ elsif should_download?(url, cache_path, metadata_path)
44
+ download_file(url, cache_path, metadata_path)
45
+ else
46
+ puts "Using cached #{filename} (not modified or within #{CACHE_EXPIRY_DAYS}-day period)"
47
+ end
48
+ end
49
+
50
+ def handle_missing_cache(filename, url, cache_path, metadata_path)
51
+ if !@fetch
52
+ puts "Cache file or metadata missing for #{filename}. Skipping as fetch is disabled."
53
+ else
54
+ puts "Cache file or metadata missing for #{filename}. Downloading..."
55
+ download_file(url, cache_path, metadata_path)
56
+ end
57
+ end
58
+
59
+ def should_download?(url, cache_path, metadata_path)
60
+ return false unless @fetch
61
+
62
+ metadata = read_metadata(metadata_path)
63
+ headers = fetch_headers(url)
64
+ server_etag = headers["ETag"]
65
+
66
+ if server_etag
67
+ handle_etag(metadata, server_etag)
68
+ else
69
+ handle_no_etag(metadata)
70
+ end
71
+ end
72
+
73
+ def handle_etag(metadata, server_etag)
74
+ if metadata["etag"] && server_etag != metadata["etag"]
75
+ puts "ETag mismatch. Stored: #{metadata["etag"]}, Server: #{server_etag}"
76
+ true
77
+ else
78
+ false
79
+ end
80
+ end
81
+
82
+ def handle_no_etag(metadata)
83
+ puts "Server did not provide an ETag. Checking file age..."
84
+ file_older_than_days?(metadata["download_time"], CACHE_EXPIRY_DAYS)
85
+ end
86
+
87
+ def file_older_than_days?(download_time, days)
88
+ return true unless download_time
89
+
90
+ file_age = (Time.now - Time.parse(download_time)) / (24 * 60 * 60)
91
+ if file_age > days
92
+ puts "Cache file is older than #{days} days. Will download."
93
+ true
94
+ else
95
+ puts "Cache file is within #{days} days old. Using cached version."
96
+ false
97
+ end
98
+ end
99
+
100
+ def fetch_headers(url)
101
+ @client.head(url).headers
102
+ rescue Faraday::Error => e
103
+ puts "Error fetching headers for #{url}: #{e.message}"
104
+ {}
105
+ end
106
+
107
+ def download_file(url, cache_path, metadata_path)
108
+ puts "Downloading #{File.basename(cache_path)}..."
109
+ response = @client.get(url)
110
+
111
+ if response.success?
112
+ write_file_and_metadata(response, url, cache_path, metadata_path)
113
+ else
114
+ puts "Error downloading #{url}: HTTP #{response.status}"
115
+ end
116
+ rescue Faraday::Error => e
117
+ puts "Error downloading #{url}: #{e.message}"
118
+ end
119
+
120
+ def write_file_and_metadata(response, url, cache_path, metadata_path)
121
+ File.write(cache_path, response.body)
122
+
123
+ metadata = {
124
+ "url" => url,
125
+ "download_time" => Time.now.iso8601,
126
+ "etag" => response.headers["ETag"],
127
+ }
128
+
129
+ File.write(metadata_path, JSON.pretty_generate(metadata))
130
+ puts "Updated metadata for #{File.basename(cache_path)}"
131
+ end
132
+
133
+ def read_metadata(metadata_path)
134
+ File.exist?(metadata_path) ? JSON.parse(File.read(metadata_path)) : {}
135
+ rescue JSON::ParserError
136
+ puts "Error parsing metadata file. Treating as empty."
137
+ {}
138
+ end
139
+ end
140
+ end
141
+ end
@@ -0,0 +1,60 @@
1
+ require "shale"
2
+ require "csv"
3
+
4
+ require "shale/adapter/csv"
5
+ unless Shale.csv_adapter
6
+ Shale.csv_adapter = Shale::Adapter::CSV
7
+ end
8
+
9
+ module Glossarist
10
+ module Agent
11
+ module Iho
12
+ class BilingualRow < Shale::Mapper
13
+ attribute :eng_id, Shale::Type::String
14
+ attribute :other_id, Shale::Type::String
15
+ attribute :eng_term, Shale::Type::String
16
+ attribute :other_term, Shale::Type::String
17
+ attribute :eng_definition, Shale::Type::String
18
+ attribute :other_definition, Shale::Type::String
19
+ end
20
+
21
+ class SimpleConcept < Shale::Mapper
22
+ attribute :id, Shale::Type::String
23
+ attribute :lang_code, Shale::Type::String
24
+ attribute :term, Shale::Type::String
25
+ attribute :definition, Shale::Type::String
26
+ end
27
+
28
+ class BilingualTable
29
+ attr_accessor :file_path, :rows, :lang_code, :concepts_eng, :concepts_other
30
+
31
+ def initialize(file_path:, lang_code:)
32
+ @file_path = file_path
33
+ @lang_code = lang_code
34
+ end
35
+
36
+ def process
37
+ # @rows = []
38
+ @rows = BilingualRow.from_csv(IO.read(@file_path))[1..-1]
39
+ @concepts_eng = @rows.map do |bilingual_row|
40
+ SimpleConcept.new(
41
+ lang_code: :eng,
42
+ id: bilingual_row.eng_id.strip,
43
+ term: bilingual_row.eng_term.strip,
44
+ definition: bilingual_row.eng_definition.strip,
45
+ )
46
+ end
47
+
48
+ @concepts_other = @rows.map do |bilingual_row|
49
+ SimpleConcept.new(
50
+ lang_code: lang_code,
51
+ id: bilingual_row.other_id.strip,
52
+ term: bilingual_row.other_term.strip,
53
+ definition: bilingual_row.other_definition.strip,
54
+ )
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,28 @@
1
+ require "thor"
2
+ require_relative "downloader"
3
+ require_relative "generator"
4
+
5
+ module Glossarist
6
+ module Agent
7
+ module Iho
8
+ class Cli < Thor
9
+ desc "retrieve-concepts", "Download IHO CSV files and generate concepts"
10
+ option :output, type: :string, default: "./output", aliases: "-o", desc: "Directory to output generated files"
11
+ option :cache, type: :string, default: "~/.glossarist-agent/cache", aliases: "-c", desc: "Directory to store cached files"
12
+ option :fetch, type: :boolean, default: true, desc: "Fetch new data (default: true)"
13
+
14
+ def retrieve_concepts
15
+ cache_dir = File.expand_path(options[:cache])
16
+ output_dir = File.expand_path(options[:output])
17
+ fetch = options[:fetch]
18
+
19
+ downloader = Downloader.new(cache_dir, fetch: fetch)
20
+ downloader.download_csv_files
21
+
22
+ generator = Generator.new(cache_dir, output_dir)
23
+ generator.save_to_files
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,44 @@
1
+ require_relative "../http_cache_downloader"
2
+
3
+ module Glossarist
4
+ module Agent
5
+ module Iho
6
+ class Downloader
7
+ LANG_MAPPING = {
8
+ fra: {
9
+ "engFreView.csv" => "http://iho-ohi.net/S32/engFreView.php?operation=ecsv",
10
+ },
11
+ spa: {
12
+ "engEspView.csv" => "http://iho-ohi.net/S32/engEspView.php?operation=ecsv",
13
+ },
14
+ zho: {
15
+ "engChnView.csv" => "http://iho-ohi.net/S32/engChnView.php?operation=ecsv",
16
+ },
17
+ ind: {
18
+ "engIndView.csv" => "http://iho-ohi.net/S32/engIndView.php?operation=ecsv",
19
+ },
20
+ }.freeze
21
+
22
+ CSV_URLS = LANG_MAPPING.values.inject({}) do |acc, x|
23
+ acc.merge!(x)
24
+ acc
25
+ end
26
+
27
+ def initialize(cache_dir, fetch: true)
28
+ @cache_downloader = HttpCacheDownloader.new(cache_dir, fetch: fetch)
29
+ end
30
+
31
+ def download_csv_files
32
+ @cache_downloader.download_files(CSV_URLS)
33
+ end
34
+
35
+ def self.lang_code_by_filename(filename)
36
+ LANG_MAPPING.each do |lang_code, files|
37
+ return lang_code if files.key?(filename)
38
+ end
39
+ nil
40
+ end
41
+ end
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,116 @@
1
+ require "glossarist"
2
+ require_relative "bilingual_table"
3
+ require_relative "downloader"
4
+
5
+ module Glossarist
6
+ module Agent
7
+ module Iho
8
+ class Generator
9
+ attr_accessor :language_tables, :simple_concepts
10
+
11
+ def initialize(cache_dir, output_path)
12
+ @cache_dir = cache_dir
13
+ @output_path = output_path
14
+ FileUtils.mkdir_p(@output_path)
15
+ end
16
+
17
+ def collection
18
+ return @collection if @collection
19
+
20
+ parse_language_tables
21
+ build_simple_concepts
22
+ convert_to_glossarist
23
+
24
+ @collection
25
+ end
26
+
27
+ def parse_language_tables
28
+ return @language_tables if @language_tables
29
+
30
+ @language_tables = {}
31
+ Dir.glob(File.join(@cache_dir, "*.csv")).map do |file|
32
+ lang_code = Downloader.lang_code_by_filename(File.basename(file))
33
+
34
+ table = BilingualTable.new(file_path: file, lang_code: lang_code).tap do |table|
35
+ table.process
36
+ end
37
+
38
+ @language_tables[lang_code] = table
39
+ end
40
+
41
+ @language_tables
42
+ end
43
+
44
+ def build_simple_concepts
45
+ return @simple_concepts if @simple_concepts
46
+
47
+ @simple_concepts = {}
48
+ build_english_concepts
49
+ build_other_language_concepts
50
+ @simple_concepts
51
+ end
52
+
53
+ def convert_to_glossarist
54
+ @collection = ::Glossarist::ManagedConceptCollection.new
55
+ @collection.managed_concepts = create_managed_concepts
56
+ end
57
+
58
+ def save_to_files
59
+ collection.save_to_files(@output_path)
60
+ puts "Concepts generated and saved to #{@output_path}"
61
+ end
62
+
63
+ private
64
+
65
+ private
66
+
67
+ def build_english_concepts
68
+ @language_tables[:fra].concepts_eng.each do |concept|
69
+ @simple_concepts[concept.id] = { eng: concept_data(concept) }
70
+ end
71
+ end
72
+
73
+ def build_other_language_concepts
74
+ @language_tables.each do |lang_code, table|
75
+ table.concepts_other.each do |concept|
76
+ @simple_concepts[concept.id][lang_code] = concept_data(concept)
77
+ end
78
+ end
79
+ end
80
+
81
+ def concept_data(concept)
82
+ {
83
+ term: concept.term,
84
+ definition: concept.definition,
85
+ }
86
+ end
87
+
88
+ def create_managed_concepts
89
+ @simple_concepts.map do |id, localized_concepts|
90
+ create_managed_concept(id, localized_concepts)
91
+ end
92
+ end
93
+
94
+ def create_managed_concept(id, localized_concepts)
95
+ Glossarist::ManagedConcept.new(id: id).tap do |con|
96
+ localized_concepts.each do |lang_code, data|
97
+ con.add_localization(create_localized_concept(lang_code, data))
98
+ end
99
+ end
100
+ end
101
+
102
+ def create_localized_concept(lang_code, data)
103
+ Glossarist::LocalizedConcept.new(
104
+ "language_code" => lang_code.to_s,
105
+ "terms" => [{
106
+ "designation" => data[:term],
107
+ "type" => "expression",
108
+ "normative_status" => "preferred",
109
+ }],
110
+ "definition" => [{ "content" => data[:definition] }],
111
+ )
112
+ end
113
+ end
114
+ end
115
+ end
116
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Glossarist
4
+ module Agent
5
+ VERSION = "0.1.0"
6
+ end
7
+ end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "agent/version"
4
+ require_relative "agent/concept"
5
+ require_relative "agent/collection"
6
+
7
+ module Glossarist
8
+ module Agent
9
+ class Error < StandardError; end
10
+
11
+ # Your code goes here...
12
+ end
13
+ end
@@ -0,0 +1 @@
1
+ require_relative "glossarist/agent"
@@ -0,0 +1,6 @@
1
+ module Glossarist
2
+ module Agent
3
+ VERSION: String
4
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
5
+ end
6
+ end
metadata ADDED
@@ -0,0 +1,191 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: glossarist-agent
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ribose Inc.
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-07-14 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: shale
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: thor
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: csv
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: glossarist
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: faraday
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rake
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rspec
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: rubocop
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: rubocop-performance
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ description: Glossarist component to retrieve content from remote sources
140
+ email:
141
+ - open.source@ribose.com
142
+ executables:
143
+ - glossarist-agent
144
+ extensions: []
145
+ extra_rdoc_files: []
146
+ files:
147
+ - ".rspec"
148
+ - ".rubocop.yml"
149
+ - CODE_OF_CONDUCT.md
150
+ - README.adoc
151
+ - Rakefile
152
+ - exe/glossarist-agent
153
+ - lib/glossarist-agent.rb
154
+ - lib/glossarist/agent.rb
155
+ - lib/glossarist/agent/cli.rb
156
+ - lib/glossarist/agent/collection.rb
157
+ - lib/glossarist/agent/concept.rb
158
+ - lib/glossarist/agent/http_cache_downloader.rb
159
+ - lib/glossarist/agent/iho/bilingual_table.rb
160
+ - lib/glossarist/agent/iho/cli.rb
161
+ - lib/glossarist/agent/iho/downloader.rb
162
+ - lib/glossarist/agent/iho/generator.rb
163
+ - lib/glossarist/agent/version.rb
164
+ - sig/glossarist/agent.rbs
165
+ homepage: https://github.com/glossarist/glossarist-agent
166
+ licenses:
167
+ - BSD-2-Clause
168
+ metadata:
169
+ homepage_uri: https://github.com/glossarist/glossarist-agent
170
+ source_code_uri: https://github.com/glossarist/glossarist-agent
171
+ changelog_uri: https://github.com/glossarist/glossarist-agent
172
+ post_install_message:
173
+ rdoc_options: []
174
+ require_paths:
175
+ - lib
176
+ required_ruby_version: !ruby/object:Gem::Requirement
177
+ requirements:
178
+ - - ">="
179
+ - !ruby/object:Gem::Version
180
+ version: 2.7.0
181
+ required_rubygems_version: !ruby/object:Gem::Requirement
182
+ requirements:
183
+ - - ">="
184
+ - !ruby/object:Gem::Version
185
+ version: '0'
186
+ requirements: []
187
+ rubygems_version: 3.5.11
188
+ signing_key:
189
+ specification_version: 4
190
+ summary: Glossarist component to retrieve content from remote sources
191
+ test_files: []