sitedog_parser 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 49947334cc7ee3ec9a5fae7622586f0ad834c515af3470beee348637742ab532
4
+ data.tar.gz: 352cfe07e7b0451ce0e06a5b816d8a4ff0b72e1c623fef8cce313860e3609db4
5
+ SHA512:
6
+ metadata.gz: 2a9b6e77751102b746f4332f9b4d253800b37951512abb2b7df1db1df53662890d94cb41e4c2d4626e5bd563c14bd2349111fe6069db1f273e5ceea9d4d76be7
7
+ data.tar.gz: 97a48e91c2878bcce840907b92cbfc3986df3cf71a3d5b2e4541a083effb884ecac10f9b4c56334d0b5696a47f4db059006fb4d05fa4cccb33643ea6cb521c65
data/CHANGELOG.md ADDED
@@ -0,0 +1,8 @@
1
+ # Changelog
2
+
3
+ ## [0.1.0] - 2023-08-14
4
+
5
+ - Initial gem release
6
+ - Basic functionality for parsing sitedog YAML format
7
+ - Support for recognizing services by URL and name
8
+ - Creation and linking of Domain, Hosting, and Service objects
data/README.md ADDED
@@ -0,0 +1,289 @@
1
+ # SitedogParser
2
+
3
+ A library for parsing and classifying web services from YAML files into structured Ruby objects.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'sitedog_parser'
11
+ ```
12
+
13
+ Then execute:
14
+
15
+ ```bash
16
+ $ bundle install
17
+ ```
18
+
19
+ Or install it yourself:
20
+
21
+ ```bash
22
+ $ gem install sitedog_parser
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ ### High-Level Interface
28
+
29
+ The easiest way to use SitedogParser is through its high-level interface:
30
+
31
+ ```ruby
32
+ require 'sitedog_parser'
33
+
34
+ # Parse from a YAML file
35
+ parsed_data = SitedogParser::Parser.parse_file('data.yml')
36
+
37
+ # Or parse from a hash (if you already loaded the YAML)
38
+ yaml_data = YAML.load_file('data.yml', symbolize_names: true)
39
+ parsed_data = SitedogParser::Parser.parse(yaml_data)
40
+
41
+ # Get all services of a specific type across all domains
42
+ all_hosting_services = SitedogParser::Parser.get_services_by_type(parsed_data, :hosting)
43
+ all_hosting_services.each do |service|
44
+ puts "Hosting service: #{service.service}, URL: #{service.url}"
45
+ end
46
+
47
+ # Get all domain names
48
+ domain_names = SitedogParser::Parser.get_domain_names(parsed_data)
49
+ puts "Found domains: #{domain_names.join(', ')}"
50
+
51
+ # Working with specific domain's services
52
+ domain_services = parsed_data['example.com']
53
+ if domain_services[:dns]
54
+ puts "DNS service: #{domain_services[:dns].first.service}"
55
+ end
56
+ ```
57
+
58
+ ### Working with Simple Fields
59
+
60
+ You can specify which fields should be treated as simple string values, not as services:
61
+
62
+ ```ruby
63
+ # Define which fields should remain as simple strings (not wrapped in Service objects)
64
+ simple_fields = [:project, :role, :environment, :registry, :bought_at]
65
+
66
+ # Parse with simple fields
67
+ parsed_data = SitedogParser::Parser.parse(yaml_data, simple_fields: simple_fields)
68
+
69
+ # Now you can access these fields directly as strings
70
+ domain_services = parsed_data['example.com']
71
+ if domain_services[:project]
72
+ puts "Project: #{domain_services[:project]}" # This is a string, not a Service object
73
+ end
74
+
75
+ # Find domains with a specific field value
76
+ domains_with_production = SitedogParser::Parser.get_domains_by_field_value(parsed_data, :environment, 'production')
77
+ puts "Production domains: #{domains_with_production.join(', ')}"
78
+ ```
79
+
80
+ ### Finding Dictionary Candidates
81
+
82
+ You can use the DictionaryAnalyzer to find services that might be missing from your dictionary:
83
+
84
+ ```ruby
85
+ require 'sitedog_parser'
86
+ require_relative 'lib/dictionary_analyzer'
87
+
88
+ # Parse your data first
89
+ parsed_data = SitedogParser::Parser.parse_file('data.yml')
90
+
91
+ # Find candidates for the dictionary (services with name but no URL)
92
+ candidates = SitedogParser::DictionaryAnalyzer.find_dictionary_candidates(parsed_data)
93
+
94
+ # Generate a report
95
+ report = SitedogParser::DictionaryAnalyzer.report(parsed_data)
96
+ puts report
97
+
98
+ # Or use the provided script
99
+ # bin/analyze_dictionary data.yml
100
+ ```
101
+
102
+ The report will show:
103
+ 1. A list of services that are missing from the dictionary
104
+ 2. How many domains use each service
105
+ 3. In which context (service type) each service is used
106
+ 4. A YAML template ready to be added to your dictionary
107
+
108
+ ### Example: Processing a YAML Configuration
109
+
110
+ Input YAML file (`services.yml`):
111
+
112
+ ```yaml
113
+ example.com:
114
+ hosting: https://aws.amazon.com
115
+ dns:
116
+ service: cloudflare
117
+ url: https://cloudflare.com
118
+ registrar: namecheap
119
+ ssl: letsencrypt
120
+ repo: https://github.com/example/repo
121
+
122
+ another-site.org:
123
+ hosting:
124
+ service: digitalocean
125
+ url: https://digitalocean.com
126
+ cdn: https://cloudfront.aws.amazon.com
127
+ dns: https://domains.google.com
128
+ ```
129
+
130
+ Processing this file:
131
+
132
+ ```ruby
133
+ require 'sitedog_parser'
134
+
135
+ # Parse the file
136
+ data = SitedogParser::Parser.parse_file('services.yml')
137
+
138
+ # Get all domains
139
+ puts "Domains: #{SitedogParser::Parser.get_domain_names(data).join(', ')}"
140
+
141
+ # Get all hosting services
142
+ hosting_services = SitedogParser::Parser.get_services_by_type(data, :hosting)
143
+ puts "\nHosting services:"
144
+ hosting_services.each do |service|
145
+ puts "- #{service.service}: #{service.url}"
146
+ end
147
+
148
+ # Get all DNS services
149
+ dns_services = SitedogParser::Parser.get_services_by_type(data, :dns)
150
+ puts "\nDNS services:"
151
+ dns_services.each do |service|
152
+ puts "- #{service.service}: #{service.url}"
153
+ end
154
+
155
+ # Access a specific domain's services
156
+ puts "\nServices for example.com:"
157
+ example_services = data['example.com']
158
+ example_services.each do |type, services|
159
+ puts "#{type}: #{services.first.service}"
160
+ end
161
+ ```
162
+
163
+ Output:
164
+ ```
165
+ Domains: example.com, another-site.org
166
+
167
+ Hosting services:
168
+ - Amazon Web Services: https://aws.amazon.com
169
+ - Digitalocean: https://digitalocean.com
170
+
171
+ DNS services:
172
+ - Cloudflare: https://cloudflare.com
173
+ - Domains: https://domains.google.com
174
+
175
+ Services for example.com:
176
+ hosting: Amazon Web Services
177
+ dns: Cloudflare
178
+ registrar: Namecheap
179
+ ssl: Letsencrypt
180
+ repo: Github
181
+ ```
182
+
183
+ ### Service Object Structure
184
+
185
+ Each service object has the following structure:
186
+
187
+ ```ruby
188
+ # Service fields
189
+ service.service # Name of the service (capitalized string)
190
+ service.url # URL of the service (string or nil)
191
+ service.children # Child services (array of Service objects, empty if none)
192
+ ```
193
+
194
+ ### Processing Different Data Formats
195
+
196
+ SitedogParser's strength is in normalizing different data formats into a consistent structure. Here are examples showing how various input formats are handled:
197
+
198
+ #### 1. Simple URL string
199
+ ```ruby
200
+ # Input
201
+ data = "https://github.com/username/repo"
202
+
203
+ # Output
204
+ service = ServiceFactory.create(data)
205
+ service.service # => "Github"
206
+ service.url # => "https://github.com"
207
+ service.children # => []
208
+ ```
209
+
210
+ #### 2. Service name string
211
+ ```ruby
212
+ # Input
213
+ data = "GitHub"
214
+
215
+ # Output
216
+ service = ServiceFactory.create(data)
217
+ service.service # => "GitHub"
218
+ service.url # => "https://github.com"
219
+ service.children # => []
220
+ ```
221
+
222
+ #### 3. Hash with service and URL
223
+ ```ruby
224
+ # Input
225
+ data = {
226
+ service: "Github",
227
+ url: "https://github.com/username/repo"
228
+ }
229
+
230
+ # Output
231
+ service = ServiceFactory.create(data)
232
+ service.service # => "Github"
233
+ service.url # => "https://github.com/username/repo"
234
+ service.children # => []
235
+ ```
236
+
237
+ #### 4. Nested hash with service types
238
+ ```ruby
239
+ # Input
240
+ data = {
241
+ dns: {
242
+ service: "route53",
243
+ url: "https://console.aws.amazon.com/route53"
244
+ },
245
+ registrar: {
246
+ service: "namecheap",
247
+ url: "https://namecheap.com"
248
+ }
249
+ }
250
+
251
+ # Output
252
+ service = ServiceFactory.create(data)
253
+ service.service # => "Unknown"
254
+ service.children.size # => 2
255
+ service.children[0].service # => "Route53"
256
+ service.children[0].url # => "https://console.aws.amazon.com/route53"
257
+ service.children[1].service # => "Namecheap"
258
+ service.children[1].url # => "https://namecheap.com"
259
+ ```
260
+
261
+ #### 5. Hash with URLs
262
+ ```ruby
263
+ # Input
264
+ data = {
265
+ hosting: "https://aws.amazon.com",
266
+ cdn: "https://cloudflare.com"
267
+ }
268
+
269
+ # Output
270
+ service = ServiceFactory.create(data)
271
+ service.service # => "Unknown"
272
+ service.children.size # => 2
273
+ service.children[0].service # => "Hosting"
274
+ service.children[0].url # => "https://aws.amazon.com"
275
+ service.children[1].service # => "Cdn"
276
+ service.children[1].url # => "https://cloudflare.com"
277
+ ```
278
+
279
+ ## Development and Contribution
280
+
281
+ 1. Fork the repository
282
+ 2. Create a branch for your changes (`git checkout -b my-new-feature`)
283
+ 3. Commit your changes (`git commit -am 'Add new feature'`)
284
+ 4. Push to the branch (`git push origin my-new-feature`)
285
+ 5. Create a Pull Request
286
+
287
+ ## License
288
+
289
+ This gem is available under the MIT license. See the [LICENSE.txt](LICENSE.txt) file for details.
@@ -0,0 +1,61 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'sitedog_parser'
5
+ require_relative '../lib/dictionary_analyzer'
6
+
7
+ if ARGV.empty? || ARGV.size < 1 || ARGV.size > 2
8
+ puts "Usage: analyze_dictionary <path_to_yaml_file> [path_to_dictionary]"
9
+ puts "Example: analyze_dictionary test/fixtures/multiple.yaml [test/fixtures/dictionary.yml]"
10
+ exit 1
11
+ end
12
+
13
+ file_path = ARGV[0]
14
+ dictionary_path = ARGV[1] if ARGV.size > 1
15
+
16
+ unless File.exist?(file_path)
17
+ puts "Error: File '#{file_path}' not found."
18
+ exit 1
19
+ end
20
+
21
+ if dictionary_path && !File.exist?(dictionary_path)
22
+ puts "Error: Dictionary file '#{dictionary_path}' not found."
23
+ exit 1
24
+ end
25
+
26
+ begin
27
+ # Загружаем и обрабатываем YAML
28
+ yaml_data = YAML.load_file(file_path, symbolize_names: true)
29
+
30
+ pp yaml_data
31
+
32
+ # Проверяем структуру
33
+ sites_data = nil
34
+ if yaml_data[:sites].is_a?(Hash)
35
+ sites_data = yaml_data[:sites]
36
+ elsif yaml_data.values.first.is_a?(Hash)
37
+ # Если нет корневого ключа 'sites', просто берем первый уровень
38
+ sites_data = yaml_data
39
+ else
40
+ puts "Error: Expected YAML with domain data in either 'sites' key or at the root level."
41
+ exit 1
42
+ end
43
+
44
+ # Определяем простые поля, которые не должны рассматриваться как сервисы
45
+ simple_fields = [:project, :role, :environment, :registry, :bought_at]
46
+
47
+ # Анализируем данные через наш интерфейс Parser
48
+ data = SitedogParser::Parser.parse(sites_data, simple_fields: simple_fields, dictionary_path: dictionary_path)
49
+
50
+ pp data
51
+
52
+ # Генерируем отчет
53
+ report = SitedogParser::DictionaryAnalyzer.report(data, dictionary_path)
54
+
55
+ puts "\n#{report}\n"
56
+
57
+ rescue => e
58
+ puts "Error processing file: #{e.message}"
59
+ puts e.backtrace.join("\n")
60
+ exit 1
61
+ end
@@ -0,0 +1,26 @@
1
+ require 'yaml'
2
+
3
+ require_relative 'service'
4
+ require_relative 'dictionary'
5
+ require_relative 'url_checker'
6
+ require_relative 'service_factory'
7
+
8
+ yaml = YAML.load_file('test/fixtures/rbbr.io/full.yml', symbolize_names: true)
9
+
10
+ services = {}
11
+
12
+ yaml.each do |domain, items|
13
+ items.each do |service_type, data|
14
+ service = ServiceFactory.create(data, service_type)
15
+
16
+ services[service_type] ||= []
17
+ services[service_type] << service if service
18
+ end
19
+
20
+ domain = Domain.new(domain, services[:dns], services[:registrar])
21
+ hosting = Hosting.new(services[:hosting], services[:cdn], services[:ssl], services[:repo])
22
+
23
+ # binding.pry
24
+ end
25
+
26
+ puts
data/lib/dictionary.rb ADDED
@@ -0,0 +1,90 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'yaml'
5
+ require_relative 'url_checker'
6
+
7
+ # Class for working with the provider dictionary
8
+ class Dictionary
9
+ # Default path to the dictionary
10
+ DEFAULT_DICTIONARY_PATH = File.expand_path('../../data/dictionary.yml', __FILE__)
11
+
12
+ # Initialize the dictionary from the YAML file
13
+ #
14
+ # @param dictionary_path [String, nil] path to the dictionary YAML file
15
+ def initialize(dictionary_path = nil)
16
+ @dictionary_path = dictionary_path || DEFAULT_DICTIONARY_PATH
17
+ @dictionary = nil # Словарь загрузится лениво при первом обращении
18
+ end
19
+
20
+ # Look up a provider by slug or alias
21
+ #
22
+ # @param slug [String] provider slug or alias to look up
23
+ # @return [Hash, nil] provider data or nil if not found
24
+ def lookup(slug)
25
+ return nil unless slug.is_a?(String)
26
+
27
+ slug = slug.downcase.strip
28
+
29
+ # Direct match by key
30
+ return dictionary[slug] if dictionary.key?(slug)
31
+
32
+ # Check aliases
33
+ dictionary.each do |key, provider|
34
+ aliases = provider['aliases'].to_s.split(',').map(&:strip)
35
+ return provider.merge('key' => key) if aliases.include?(slug)
36
+ end
37
+
38
+ nil
39
+ end
40
+
41
+ # Find a provider that matches the given URL
42
+ #
43
+ # @param url [String] URL to match against provider patterns
44
+ # @return [Hash, nil] provider data or nil if no match found
45
+ def match(url)
46
+ return nil unless UrlChecker.url_like?(url)
47
+
48
+ normalized_url = UrlChecker.normalize_url(url)
49
+ return nil unless normalized_url
50
+
51
+ dictionary.each do |key, provider|
52
+ pattern = provider['url_pattern']
53
+ next unless pattern
54
+
55
+ regexp = Regexp.new(pattern, Regexp::IGNORECASE)
56
+ return provider.merge('key' => key) if regexp.match?(normalized_url)
57
+ end
58
+
59
+ nil
60
+ end
61
+
62
+ # Get all providers in the dictionary
63
+ #
64
+ # @return [Hash] the entire dictionary
65
+ def all_providers
66
+ dictionary
67
+ end
68
+
69
+ private
70
+
71
+ # Ленивый доступ к словарю - загружает его только при первом обращении
72
+ #
73
+ # @return [Hash] словарь провайдеров
74
+ def dictionary
75
+ @dictionary ||= load_dictionary(@dictionary_path)
76
+ end
77
+
78
+ # Load the dictionary from a YAML file
79
+ #
80
+ # @param path [String] path to the dictionary file
81
+ # @return [Hash] loaded dictionary
82
+ def load_dictionary(path)
83
+ return {} unless path && File.exist?(path)
84
+
85
+ YAML.load_file(path)
86
+ rescue StandardError => e
87
+ warn "Error loading dictionary: #{e.message}"
88
+ {}
89
+ end
90
+ end
@@ -0,0 +1,85 @@
1
+ require_relative 'dictionary'
2
+
3
+ module SitedogParser
4
+ # Class for analyzing parsing results and finding candidates for dictionary additions
5
+ class DictionaryAnalyzer
6
+ # Finds all services that are potentially missing from the dictionary (have a name but no URL)
7
+ #
8
+ # @param parsed_data [Hash] data received from Parser
9
+ # @param dictionary_path [String, nil] path to the dictionary file (optional)
10
+ # @return [Hash] hash with candidates for dictionary addition
11
+ def self.find_dictionary_candidates(parsed_data, dictionary_path = nil)
12
+ candidates = {}
13
+ current_dictionary = dictionary_path ? Dictionary.new(dictionary_path) : Dictionary.new
14
+
15
+ parsed_data.each do |domain_name, services|
16
+ services.each do |service_type, service_list|
17
+ # Skip simple fields (non-array service lists)
18
+ next unless service_list.is_a?(Array)
19
+
20
+ service_list.each do |service|
21
+ # Candidates are services that:
22
+ # 1. Have a name
23
+ # 2. Have no URL
24
+ # 3. Have no children
25
+ # 4. Are not in the current dictionary
26
+ is_candidate = service.service && # Has a name
27
+ !service.url && # No URL
28
+ service.children.empty? && # No children
29
+ current_dictionary.lookup(service.service).nil? # Not in dictionary
30
+
31
+ if is_candidate
32
+ # Add to candidate list with context information
33
+ service_name = service.service.downcase
34
+ candidates[service_name] ||= {
35
+ name: service.service,
36
+ service_types: [],
37
+ domains: []
38
+ }
39
+
40
+ # Add service type and domain information
41
+ candidates[service_name][:service_types] << service_type unless candidates[service_name][:service_types].include?(service_type)
42
+ candidates[service_name][:domains] << domain_name.to_s unless candidates[service_name][:domains].include?(domain_name.to_s)
43
+ end
44
+ end
45
+ end
46
+ end
47
+
48
+ # Sort candidates by usage frequency
49
+ candidates.transform_values do |candidate|
50
+ candidate[:domains_count] = candidate[:domains].size
51
+ candidate[:types_count] = candidate[:service_types].size
52
+ candidate
53
+ end.sort_by { |_name, data| -data[:domains_count] }.to_h
54
+ end
55
+
56
+ # Analyzes data and outputs a report on dictionary candidates
57
+ #
58
+ # @param parsed_data [Hash] data received from Parser
59
+ # @param dictionary_path [String, nil] path to the dictionary file (optional)
60
+ # @return [String] report on dictionary candidates
61
+ def self.report(parsed_data, dictionary_path = nil)
62
+ candidates = find_dictionary_candidates(parsed_data, dictionary_path)
63
+
64
+ if candidates.empty?
65
+ return "All services have URLs or are already in the dictionary. No candidates for addition."
66
+ end
67
+
68
+ report = [
69
+ "DICTIONARY CANDIDATES REPORT",
70
+ "===========================",
71
+ "Found #{candidates.size} potential services to add to dictionary:",
72
+ ""
73
+ ]
74
+
75
+ candidates.each do |name, data|
76
+ report << "#{data[:name]}:"
77
+ report << " - Used in #{data[:domains_count]} domain(s): #{data[:domains].join(', ')}"
78
+ report << " - Used as service type(s): #{data[:service_types].map(&:to_s).join(', ')}"
79
+ report << ""
80
+ end
81
+
82
+ report.join("\n")
83
+ end
84
+ end
85
+ end
data/lib/entities.rb ADDED
@@ -0,0 +1,3 @@
1
+ # Core data structures for domain and hosting information
2
+ Domain = Data.define(:domain, :dns, :registrar)
3
+ Hosting = Data.define(:hosting, :cdn, :ssl, :repo)
data/lib/service.rb ADDED
@@ -0,0 +1,11 @@
1
+ class Service < Data.define(:service, :url, :children)
2
+ def initialize(service:, url: nil, children: [])
3
+ raise ArgumentError, "Service cannot be empty" if service.nil? || service.empty?
4
+
5
+ service => String
6
+ url => String if url
7
+ children => Array if children
8
+
9
+ super
10
+ end
11
+ end
@@ -0,0 +1,181 @@
1
+ require 'pry'
2
+ require_relative 'url_checker'
3
+ require_relative 'dictionary'
4
+ require_relative 'service'
5
+
6
+ # Factory for creating Service objects from different data formats
7
+ class ServiceFactory
8
+ # Creates a Service object from various data formats
9
+ #
10
+ # @param data [String, Hash, Array] data for creating service
11
+ # @param service_type [Symbol] service type (used as fallback)
12
+ # @param dictionary_path [String, nil] path to the dictionary file (optional)
13
+ # @return [Service] created service object
14
+ def self.create(data, service_type = nil, dictionary_path = nil)
15
+ # Check for nil
16
+ return nil if data.nil?
17
+
18
+ slug = nil
19
+ url = nil
20
+ dictionary = Dictionary.new(dictionary_path)
21
+
22
+ case data
23
+ in String if UrlChecker.url_like?(data) # url
24
+ url = UrlChecker.normalize_url(data)
25
+ slug = dictionary.match(url)&.dig('name')
26
+
27
+ # If not found in dictionary and service_type exists, use it
28
+ if slug.nil? && service_type
29
+ slug = service_type.to_s
30
+ else
31
+ # Otherwise try to extract name from URL
32
+ slug = UrlChecker.extract_name(url) if slug.nil?
33
+ end
34
+
35
+ puts "url: #{slug} <- #{url}"
36
+ in String if !UrlChecker.url_like?(data) # slug
37
+ slug = data
38
+ url = dictionary.lookup(slug)&.dig('url')
39
+ puts "slug: #{slug} -> #{url}"
40
+ in { service: String => service_slug, url: String => service_url }
41
+ slug = service_slug.to_s.capitalize
42
+ url = service_url
43
+ puts "hash: #{slug} + #{url}"
44
+ in Hash
45
+ puts "hash: #{data}"
46
+
47
+ # Protection from nil values in key fields
48
+ if (data.key?(:service) || data.key?("service")) &&
49
+ (data[:service].nil? || data["service"].nil?)
50
+ return nil
51
+ end
52
+
53
+ # 1. Check if hash contains only URL-like strings (list of services)
54
+ if data.values.all? { |v| v.is_a?(String) && UrlChecker.url_like?(v) }
55
+ puts "hash with services: #{data.keys.join(', ')}"
56
+ # Create array of child services
57
+ children = []
58
+ data.each do |key, url_value|
59
+ service_name = key.to_s
60
+ child_service = Service.new(service: service_name.capitalize, url: url_value)
61
+ children << child_service
62
+ end
63
+
64
+ # Create parent service with child elements
65
+ if service_type && children.any?
66
+ return Service.new(service: service_type.to_s, children: children)
67
+ elsif children.size == 1
68
+ # If only one service and no service_type, return it directly
69
+ return children.first
70
+ end
71
+ end
72
+
73
+ # 2. If hash contains service and url (possibly with additional fields)
74
+ if (data.key?(:service) || data.key?("service")) &&
75
+ (data.key?(:url) || data.key?("url"))
76
+ service_key = data.key?(:service) ? :service : "service"
77
+ service_name = data[service_key].to_s
78
+
79
+ url_key = data.key?(:url) ? :url : "url"
80
+ url_value = data[url_key]
81
+
82
+ return Service.new(service: service_name.capitalize, url: url_value)
83
+ end
84
+
85
+ # 3. Process nested hashes
86
+ children = []
87
+
88
+ data.each do |key, value|
89
+ child = nil
90
+
91
+ if value.is_a?(Hash)
92
+ # 3.1 If value has a hash with service and url
93
+ if (value.key?(:service) || value.key?("service")) &&
94
+ (value.key?(:url) || value.key?("url"))
95
+ service_key = value.key?(:service) ? :service : "service"
96
+ service_name = value[service_key].to_s
97
+
98
+ url_key = value.key?(:url) ? :url : "url"
99
+ url_value = value[url_key]
100
+
101
+ child = Service.new(service: service_name.capitalize, url: url_value)
102
+ # 3.2 If value has hash with only URL-like values
103
+ elsif value.values.all? { |v| v.is_a?(String) && UrlChecker.url_like?(v) }
104
+ child_children = []
105
+
106
+ value.each do |sub_key, url_value|
107
+ child_children << Service.new(service: sub_key.to_s.capitalize, url: url_value)
108
+ end
109
+
110
+ child = Service.new(service: key.to_s, children: child_children)
111
+ # 3.3 Recursively process other cases
112
+ else
113
+ child = create(value, key, dictionary_path)
114
+
115
+ # If nothing worked, create an empty service with key name
116
+ if child.nil? && value.is_a?(Hash)
117
+ child_children = []
118
+ has_urls = false
119
+
120
+ value.each do |sub_key, sub_value|
121
+ if sub_value.is_a?(String) && UrlChecker.url_like?(sub_value)
122
+ has_urls = true
123
+ child_children << Service.new(service: sub_key.to_s.capitalize, url: sub_value)
124
+ end
125
+ end
126
+
127
+ child = Service.new(service: key.to_s, children: child_children) if has_urls
128
+ end
129
+ end
130
+ # 3.4 If the value is a URL string
131
+ elsif value.is_a?(String) && UrlChecker.url_like?(value)
132
+ child = Service.new(service: key.to_s.capitalize, url: value)
133
+ end
134
+
135
+ children << child if child
136
+ end
137
+
138
+ # Create parent service if there are child elements
139
+ if children.any? && service_type
140
+ return Service.new(service: service_type.to_s, children: children)
141
+ elsif children.size == 1 && !service_type
142
+ # If only one child element and no service_type, return it
143
+ return children.first
144
+ elsif children.any?
145
+ # If there are child elements but no service_type, create a service with unknown name
146
+ return Service.new(service: "Unknown", children: children)
147
+ end
148
+ in Array
149
+ puts "array: #{data}"
150
+
151
+ # Create services from array elements
152
+ children = data.map { |item| create(item, service_type, dictionary_path) }.compact
153
+
154
+ # If there are child services, create a parent service with them
155
+ if children.any? && service_type
156
+ return Service.new(service: service_type.to_s, children: children)
157
+ elsif children.size == 1
158
+ # If only one child service, return it
159
+ return children.first
160
+ end
161
+
162
+ # If no child services or no name for parent service,
163
+ # return nil
164
+ return nil
165
+ else
166
+ # Handle values that don't match any pattern
167
+ return nil
168
+ end
169
+
170
+ # Create service with collected data
171
+ if slug
172
+ Service.new(service: slug, url: url)
173
+ else
174
+ nil
175
+ end
176
+ rescue => e
177
+ puts "Error creating service: #{e.message}"
178
+ puts "Data: #{data.inspect}"
179
+ return nil
180
+ end
181
+ end
@@ -0,0 +1,3 @@
1
+ module SitedogParser
2
+ VERSION = "0.1.1"
3
+ end
@@ -0,0 +1,108 @@
1
+ require "sitedog_parser/version"
2
+ require 'yaml'
3
+
4
+ require_relative "service"
5
+ require_relative "dictionary"
6
+ require_relative "url_checker"
7
+ require_relative "service_factory"
8
+
9
+ module SitedogParser
10
+ class Error < StandardError; end
11
+
12
+ # Main parser class that provides a high-level interface to the library
13
+ class Parser
14
+ # By default, fields that should not be processed as services
15
+ DEFAULT_SIMPLE_FIELDS = [:project, :role, :environment, :bought_at]
16
+
17
+ # Parse a YAML file and convert it to structured Ruby objects
18
+ #
19
+ # @param file_path [String] path to the YAML file
20
+ # @param symbolize_names [Boolean] whether to symbolize keys in the YAML file
21
+ # @param simple_fields [Array<Symbol>] fields that should remain as simple strings without service wrapping
22
+ # @param dictionary_path [String, nil] path to the dictionary file (optional)
23
+ # @return [Hash] hash containing parsed services by type and domain
24
+ def self.parse_file(file_path, symbolize_names: true, simple_fields: DEFAULT_SIMPLE_FIELDS, dictionary_path: nil)
25
+ yaml = YAML.load_file(file_path, symbolize_names: symbolize_names)
26
+ parse(yaml, simple_fields: simple_fields, dictionary_path: dictionary_path)
27
+ end
28
+
29
+ # Parse YAML data and convert it to structured Ruby objects
30
+ #
31
+ # @param yaml [Hash] YAML data as a hash
32
+ # @param simple_fields [Array<Symbol>] fields that should remain as simple strings without service wrapping
33
+ # @param dictionary_path [String, nil] path to the dictionary file (optional)
34
+ # @return [Hash] hash containing parsed services by type and domain
35
+ def self.parse(yaml, simple_fields: DEFAULT_SIMPLE_FIELDS, dictionary_path: nil)
36
+ result = {}
37
+
38
+ yaml.each do |domain_name, items|
39
+ services = {}
40
+
41
+ # Process each service type and its data
42
+ items.each do |service_type, data|
43
+ # Проверяем, является ли это поле "простым" (не сервисом)
44
+ if simple_fields.include?(service_type)
45
+ # Для простых полей просто сохраняем значение без оборачивания в сервис
46
+ services[service_type] = data
47
+ else
48
+ # Для обычных полей создаем сервис
49
+ service = ServiceFactory.create(data, service_type, dictionary_path)
50
+
51
+ if service
52
+ services[service_type] ||= []
53
+ services[service_type] << service
54
+ end
55
+ end
56
+ end
57
+
58
+ # Create a structure with all the services
59
+ result[domain_name] = services
60
+ end
61
+
62
+ result
63
+ end
64
+
65
+ # Get all services of a specific type from parsed data
66
+ #
67
+ # @param parsed_data [Hash] data returned by parse or parse_file
68
+ # @param service_type [Symbol] type of service to extract
69
+ # @return [Array] array of services of the specified type
70
+ def self.get_services_by_type(parsed_data, service_type)
71
+ result = []
72
+
73
+ parsed_data.each do |_domain_name, services|
74
+ if services[service_type] && services[service_type].is_a?(Array)
75
+ result.concat(services[service_type])
76
+ end
77
+ end
78
+
79
+ result
80
+ end
81
+
82
+ # Get domain names from parsed data
83
+ #
84
+ # @param parsed_data [Hash] data returned by parse or parse_file
85
+ # @return [Array] array of domain names
86
+ def self.get_domain_names(parsed_data)
87
+ parsed_data.keys
88
+ end
89
+
90
+ # Get domains with a specific simple field value
91
+ #
92
+ # @param parsed_data [Hash] data returned by parse or parse_file
93
+ # @param field [Symbol] simple field to filter by
94
+ # @param value [String] value to match
95
+ # @return [Array] array of domain names that have the specified field value
96
+ def self.get_domains_by_field_value(parsed_data, field, value)
97
+ result = []
98
+
99
+ parsed_data.each do |domain_name, services|
100
+ if services[field] == value
101
+ result << domain_name
102
+ end
103
+ end
104
+
105
+ result
106
+ end
107
+ end
108
+ end
@@ -0,0 +1,87 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Module for working with URL-like strings
5
+ #
6
+ # Usage:
7
+ # require_relative 'lib/url_checker'
8
+ #
9
+ # UrlChecker.url_like?("example.com") # => true
10
+ # UrlChecker.url_like?("http://example.com") # => true
11
+ # UrlChecker.url_like?("not-a-url") # => false
12
+ #
13
+ # UrlChecker.normalize_url("example.com") # => "https://example.com"
14
+ module UrlChecker
15
+ # Checks if a string looks like a URL
16
+ #
17
+ # @param string [String] string to check
18
+ # @return [Boolean] true if the string looks like a URL, false otherwise
19
+ def self.url_like?(string)
20
+ return false unless string.is_a?(String)
21
+
22
+ # Regular expression for checking URL-like strings
23
+ # Supports various protocols and formats:
24
+ # - standard URLs (with http, https, ftp, etc.)
25
+ # - Git URLs (format git@hostname:user/repo.git)
26
+ if string.match?(/^git@[a-zA-Z0-9][-a-zA-Z0-9.]+\.[a-zA-Z]{2,}:[a-zA-Z0-9\/_.-]+\.git$/)
27
+ return true
28
+ end
29
+
30
+ # Check for standard URLs
31
+ pattern = /^((?:https?|ftp|sftp|ftps|ssh|git|ws|wss):\/\/)?[a-zA-Z0-9][-a-zA-Z0-9.]+\.[a-zA-Z]{2,}(:[0-9]+)?(\/[-a-zA-Z0-9%_.~#+]*)*(\?[-a-zA-Z0-9%_&=.~#+]*)?(#[-a-zA-Z0-9%_&=.~#+\/]*)?$/
32
+
33
+ !!string.match(pattern)
34
+ end
35
+
36
+ # Normalizes a URL by adding a protocol if missing
37
+ #
38
+ # @param url [String] URL to normalize
39
+ # @param default_protocol [String] protocol to prepend if none exists (default: "https")
40
+ # @return [String, nil] normalized URL, or nil if input is not a valid URL
41
+ def self.normalize_url(url, default_protocol = "https")
42
+ return nil unless url_like?(url)
43
+
44
+ # Git URLs remain as is
45
+ return url if url.start_with?("git@")
46
+
47
+ # Return as is if already has a protocol
48
+ return url if url.match?(/^[a-zA-Z]+:\/\//)
49
+
50
+ # Add default protocol
51
+ "#{default_protocol}://#{url}"
52
+ end
53
+
54
+ # Extracts the service name from a URL
55
+ #
56
+ # @param url [String] URL to extract the name from
57
+ # @return [String, nil] name of the service or nil if could not be extracted
58
+ def self.extract_name(url)
59
+ return nil unless url_like?(url)
60
+
61
+ # Remove protocol and www prefix if present
62
+ domain = url.gsub(%r{^(?:https?://)?(?:www\.)?}, "")
63
+
64
+ # Extract domain from URL by removing everything after first / or : or ? or #
65
+ domain = domain.split(/[:\/?#]/).first
66
+
67
+ # Extract the service name (usually the second-level domain)
68
+ parts = domain.split(".")
69
+
70
+ # If domain has enough parts (e.g., example.com, sub.example.com)
71
+ if parts.size >= 2
72
+ # For most domains, the second-to-last part is the name
73
+ # e.g., example.com -> example, sub.example.com -> example
74
+ service_name = parts[-2]
75
+
76
+ # Special cases for country-specific TLDs with subdomains
77
+ # e.g., example.co.uk -> example
78
+ if parts.size >= 3 && ["co", "com", "org", "net", "ac"].include?(parts[-2])
79
+ service_name = parts[-3]
80
+ end
81
+
82
+ return service_name
83
+ end
84
+
85
+ nil
86
+ end
87
+ end
metadata ADDED
@@ -0,0 +1,144 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: sitedog_parser
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Ivan Nemytchenko
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2025-04-15 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '13.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '13.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: minitest
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '5.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '5.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: pry
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 0.14.1
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 0.14.1
69
+ - !ruby/object:Gem::Dependency
70
+ name: bump
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: 0.10.0
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: 0.10.0
83
+ - !ruby/object:Gem::Dependency
84
+ name: thor
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.2'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.2'
97
+ description: A library for parsing and classifying web services, hosting, and domain
98
+ data from YAML files into structured Ruby objects
99
+ email:
100
+ - nemytchenko@gmail.com
101
+ executables:
102
+ - analyze_dictionary
103
+ extensions: []
104
+ extra_rdoc_files: []
105
+ files:
106
+ - CHANGELOG.md
107
+ - README.md
108
+ - bin/analyze_dictionary
109
+ - lib/data_structures.rb
110
+ - lib/dictionary.rb
111
+ - lib/dictionary_analyzer.rb
112
+ - lib/entities.rb
113
+ - lib/service.rb
114
+ - lib/service_factory.rb
115
+ - lib/sitedog_parser.rb
116
+ - lib/sitedog_parser/version.rb
117
+ - lib/url_checker.rb
118
+ homepage: https://github.com/inem/sitedog-parser
119
+ licenses:
120
+ - MIT
121
+ metadata:
122
+ homepage_uri: https://github.com/inem/sitedog-parser
123
+ source_code_uri: https://github.com/inem/sitedog-parser
124
+ changelog_uri: https://github.com/inem/sitedog-parser/blob/master/CHANGELOG.md
125
+ post_install_message:
126
+ rdoc_options: []
127
+ require_paths:
128
+ - lib
129
+ required_ruby_version: !ruby/object:Gem::Requirement
130
+ requirements:
131
+ - - ">="
132
+ - !ruby/object:Gem::Version
133
+ version: 3.3.7
134
+ required_rubygems_version: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
139
+ requirements: []
140
+ rubygems_version: 3.5.22
141
+ signing_key:
142
+ specification_version: 4
143
+ summary: Parser for converting YAML format into Ruby data structures
144
+ test_files: []