RubyGems - sitedog_parser - Versions diffs - 0.1.1 - Mend

sitedog_parser 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +8 -0
data/README.md +289 -0
data/bin/analyze_dictionary +61 -0
data/lib/data_structures.rb +26 -0
data/lib/dictionary.rb +90 -0
data/lib/dictionary_analyzer.rb +85 -0
data/lib/entities.rb +3 -0
data/lib/service.rb +11 -0
data/lib/service_factory.rb +181 -0
data/lib/sitedog_parser/version.rb +3 -0
data/lib/sitedog_parser.rb +108 -0
data/lib/url_checker.rb +87 -0
metadata +144 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 49947334cc7ee3ec9a5fae7622586f0ad834c515af3470beee348637742ab532
+  data.tar.gz: 352cfe07e7b0451ce0e06a5b816d8a4ff0b72e1c623fef8cce313860e3609db4
+SHA512:
+  metadata.gz: 2a9b6e77751102b746f4332f9b4d253800b37951512abb2b7df1db1df53662890d94cb41e4c2d4626e5bd563c14bd2349111fe6069db1f273e5ceea9d4d76be7
+  data.tar.gz: 97a48e91c2878bcce840907b92cbfc3986df3cf71a3d5b2e4541a083effb884ecac10f9b4c56334d0b5696a47f4db059006fb4d05fa4cccb33643ea6cb521c65

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Changelog
+## [0.1.0] - 2023-08-14
+- Initial gem release
+- Basic functionality for parsing sitedog YAML format
+- Support for recognizing services by URL and name
+- Creation and linking of Domain, Hosting, and Service objects

data/README.md ADDED Viewed

@@ -0,0 +1,289 @@
+# SitedogParser
+A library for parsing and classifying web services from YAML files into structured Ruby objects.
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'sitedog_parser'
+```
+Then execute:
+```bash
+$ bundle install
+```
+Or install it yourself:
+```bash
+$ gem install sitedog_parser
+```
+## Usage
+### High-Level Interface
+The easiest way to use SitedogParser is through its high-level interface:
+```ruby
+require 'sitedog_parser'
+# Parse from a YAML file
+parsed_data = SitedogParser::Parser.parse_file('data.yml')
+# Or parse from a hash (if you already loaded the YAML)
+yaml_data = YAML.load_file('data.yml', symbolize_names: true)
+parsed_data = SitedogParser::Parser.parse(yaml_data)
+# Get all services of a specific type across all domains
+all_hosting_services = SitedogParser::Parser.get_services_by_type(parsed_data, :hosting)
+all_hosting_services.each do |service|
+  puts "Hosting service: #{service.service}, URL: #{service.url}"
+end
+# Get all domain names
+domain_names = SitedogParser::Parser.get_domain_names(parsed_data)
+puts "Found domains: #{domain_names.join(', ')}"
+# Working with specific domain's services
+domain_services = parsed_data['example.com']
+if domain_services[:dns]
+  puts "DNS service: #{domain_services[:dns].first.service}"
+end
+```
+### Working with Simple Fields
+You can specify which fields should be treated as simple string values, not as services:
+```ruby
+# Define which fields should remain as simple strings (not wrapped in Service objects)
+simple_fields = [:project, :role, :environment, :registry, :bought_at]
+# Parse with simple fields
+parsed_data = SitedogParser::Parser.parse(yaml_data, simple_fields: simple_fields)
+# Now you can access these fields directly as strings
+domain_services = parsed_data['example.com']
+if domain_services[:project]
+  puts "Project: #{domain_services[:project]}"  # This is a string, not a Service object
+end
+# Find domains with a specific field value
+domains_with_production = SitedogParser::Parser.get_domains_by_field_value(parsed_data, :environment, 'production')
+puts "Production domains: #{domains_with_production.join(', ')}"
+```
+### Finding Dictionary Candidates
+You can use the DictionaryAnalyzer to find services that might be missing from your dictionary:
+```ruby
+require 'sitedog_parser'
+require_relative 'lib/dictionary_analyzer'
+# Parse your data first
+parsed_data = SitedogParser::Parser.parse_file('data.yml')
+# Find candidates for the dictionary (services with name but no URL)
+candidates = SitedogParser::DictionaryAnalyzer.find_dictionary_candidates(parsed_data)
+# Generate a report
+report = SitedogParser::DictionaryAnalyzer.report(parsed_data)
+puts report
+# Or use the provided script
+# bin/analyze_dictionary data.yml
+```
+The report will show:
+1. A list of services that are missing from the dictionary
+2. How many domains use each service
+3. In which context (service type) each service is used
+4. A YAML template ready to be added to your dictionary
+### Example: Processing a YAML Configuration
+Input YAML file (`services.yml`):
+```yaml
+example.com:
+  hosting: https://aws.amazon.com
+  dns:
+    service: cloudflare
+    url: https://cloudflare.com
+  registrar: namecheap
+  ssl: letsencrypt
+  repo: https://github.com/example/repo
+another-site.org:
+  hosting:
+    service: digitalocean
+    url: https://digitalocean.com
+  cdn: https://cloudfront.aws.amazon.com
+  dns: https://domains.google.com
+```
+Processing this file:
+```ruby
+require 'sitedog_parser'
+# Parse the file
+data = SitedogParser::Parser.parse_file('services.yml')
+# Get all domains
+puts "Domains: #{SitedogParser::Parser.get_domain_names(data).join(', ')}"
+# Get all hosting services
+hosting_services = SitedogParser::Parser.get_services_by_type(data, :hosting)
+puts "\nHosting services:"
+hosting_services.each do |service|
+  puts "- #{service.service}: #{service.url}"
+end
+# Get all DNS services
+dns_services = SitedogParser::Parser.get_services_by_type(data, :dns)
+puts "\nDNS services:"
+dns_services.each do |service|
+  puts "- #{service.service}: #{service.url}"
+end
+# Access a specific domain's services
+puts "\nServices for example.com:"
+example_services = data['example.com']
+example_services.each do |type, services|
+  puts "#{type}: #{services.first.service}"
+end
+```
+Output:
+```
+Domains: example.com, another-site.org
+Hosting services:
+- Amazon Web Services: https://aws.amazon.com
+- Digitalocean: https://digitalocean.com
+DNS services:
+- Cloudflare: https://cloudflare.com
+- Domains: https://domains.google.com
+Services for example.com:
+hosting: Amazon Web Services
+dns: Cloudflare
+registrar: Namecheap
+ssl: Letsencrypt
+repo: Github
+```
+### Service Object Structure
+Each service object has the following structure:
+```ruby
+# Service fields
+service.service  # Name of the service (capitalized string)
+service.url      # URL of the service (string or nil)
+service.children # Child services (array of Service objects, empty if none)
+```
+### Processing Different Data Formats
+SitedogParser's strength is in normalizing different data formats into a consistent structure. Here are examples showing how various input formats are handled:
+#### 1. Simple URL string
+```ruby
+# Input
+data = "https://github.com/username/repo"
+# Output
+service = ServiceFactory.create(data)
+service.service  # => "Github"
+service.url      # => "https://github.com"
+service.children # => []
+```
+#### 2. Service name string
+```ruby
+# Input
+data = "GitHub"
+# Output
+service = ServiceFactory.create(data)
+service.service  # => "GitHub"
+service.url      # => "https://github.com"
+service.children # => []
+```
+#### 3. Hash with service and URL
+```ruby
+# Input
+data = {
+  service: "Github",
+  url: "https://github.com/username/repo"
+}
+# Output
+service = ServiceFactory.create(data)
+service.service  # => "Github"
+service.url      # => "https://github.com/username/repo"
+service.children # => []
+```
+#### 4. Nested hash with service types
+```ruby
+# Input
+data = {
+  dns: {
+    service: "route53",
+    url: "https://console.aws.amazon.com/route53"
+  },
+  registrar: {
+    service: "namecheap",
+    url: "https://namecheap.com"
+  }
+}
+# Output
+service = ServiceFactory.create(data)
+service.service           # => "Unknown"
+service.children.size     # => 2
+service.children[0].service # => "Route53"
+service.children[0].url     # => "https://console.aws.amazon.com/route53"
+service.children[1].service # => "Namecheap"
+service.children[1].url     # => "https://namecheap.com"
+```
+#### 5. Hash with URLs
+```ruby
+# Input
+data = {
+  hosting: "https://aws.amazon.com",
+  cdn: "https://cloudflare.com"
+}
+# Output
+service = ServiceFactory.create(data)
+service.service           # => "Unknown"
+service.children.size     # => 2
+service.children[0].service # => "Hosting"
+service.children[0].url     # => "https://aws.amazon.com"
+service.children[1].service # => "Cdn"
+service.children[1].url     # => "https://cloudflare.com"
+```
+## Development and Contribution
+1. Fork the repository
+2. Create a branch for your changes (`git checkout -b my-new-feature`)
+3. Commit your changes (`git commit -am 'Add new feature'`)
+4. Push to the branch (`git push origin my-new-feature`)
+5. Create a Pull Request
+## License
+This gem is available under the MIT license. See the [LICENSE.txt](LICENSE.txt) file for details.

data/bin/analyze_dictionary ADDED Viewed

@@ -0,0 +1,61 @@
+#!/usr/bin/env ruby
+require 'bundler/setup'
+require 'sitedog_parser'
+require_relative '../lib/dictionary_analyzer'
+if ARGV.empty? || ARGV.size < 1 || ARGV.size > 2
+  puts "Usage: analyze_dictionary <path_to_yaml_file> [path_to_dictionary]"
+  puts "Example: analyze_dictionary test/fixtures/multiple.yaml [test/fixtures/dictionary.yml]"
+  exit 1
+end
+file_path = ARGV[0]
+dictionary_path = ARGV[1] if ARGV.size > 1
+unless File.exist?(file_path)
+  puts "Error: File '#{file_path}' not found."
+  exit 1
+end
+if dictionary_path && !File.exist?(dictionary_path)
+  puts "Error: Dictionary file '#{dictionary_path}' not found."
+  exit 1
+end
+begin
+  # Загружаем и обрабатываем YAML
+  yaml_data = YAML.load_file(file_path, symbolize_names: true)
+  pp yaml_data
+  # Проверяем структуру
+  sites_data = nil
+  if yaml_data[:sites].is_a?(Hash)
+    sites_data = yaml_data[:sites]
+  elsif yaml_data.values.first.is_a?(Hash)
+    # Если нет корневого ключа 'sites', просто берем первый уровень
+    sites_data = yaml_data
+  else
+    puts "Error: Expected YAML with domain data in either 'sites' key or at the root level."
+    exit 1
+  end
+  # Определяем простые поля, которые не должны рассматриваться как сервисы
+  simple_fields = [:project, :role, :environment, :registry, :bought_at]
+  # Анализируем данные через наш интерфейс Parser
+  data = SitedogParser::Parser.parse(sites_data, simple_fields: simple_fields, dictionary_path: dictionary_path)
+  pp data
+  # Генерируем отчет
+  report = SitedogParser::DictionaryAnalyzer.report(data, dictionary_path)
+  puts "\n#{report}\n"
+rescue => e
+  puts "Error processing file: #{e.message}"
+  puts e.backtrace.join("\n")
+  exit 1
+end

data/lib/data_structures.rb ADDED Viewed

@@ -0,0 +1,26 @@
+require 'yaml'
+require_relative 'service'
+require_relative 'dictionary'
+require_relative 'url_checker'
+require_relative 'service_factory'
+yaml = YAML.load_file('test/fixtures/rbbr.io/full.yml', symbolize_names: true)
+services = {}
+yaml.each do |domain, items|
+  items.each do |service_type, data|
+    service = ServiceFactory.create(data, service_type)
+    services[service_type] ||= []
+    services[service_type] << service if service
+  end
+  domain = Domain.new(domain, services[:dns], services[:registrar])
+  hosting = Hosting.new(services[:hosting], services[:cdn], services[:ssl], services[:repo])
+  # binding.pry
+end
+puts

data/lib/dictionary.rb ADDED Viewed

@@ -0,0 +1,90 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require 'yaml'
+require_relative 'url_checker'
+# Class for working with the provider dictionary
+class Dictionary
+  # Default path to the dictionary
+  DEFAULT_DICTIONARY_PATH = File.expand_path('../../data/dictionary.yml', __FILE__)
+  # Initialize the dictionary from the YAML file
+  #
+  # @param dictionary_path [String, nil] path to the dictionary YAML file
+  def initialize(dictionary_path = nil)
+    @dictionary_path = dictionary_path || DEFAULT_DICTIONARY_PATH
+    @dictionary = nil # Словарь загрузится лениво при первом обращении
+  end
+  # Look up a provider by slug or alias
+  #
+  # @param slug [String] provider slug or alias to look up
+  # @return [Hash, nil] provider data or nil if not found
+  def lookup(slug)
+    return nil unless slug.is_a?(String)
+    slug = slug.downcase.strip
+    # Direct match by key
+    return dictionary[slug] if dictionary.key?(slug)
+    # Check aliases
+    dictionary.each do |key, provider|
+      aliases = provider['aliases'].to_s.split(',').map(&:strip)
+      return provider.merge('key' => key) if aliases.include?(slug)
+    end
+    nil
+  end
+  # Find a provider that matches the given URL
+  #
+  # @param url [String] URL to match against provider patterns
+  # @return [Hash, nil] provider data or nil if no match found
+  def match(url)
+    return nil unless UrlChecker.url_like?(url)
+    normalized_url = UrlChecker.normalize_url(url)
+    return nil unless normalized_url
+    dictionary.each do |key, provider|
+      pattern = provider['url_pattern']
+      next unless pattern
+      regexp = Regexp.new(pattern, Regexp::IGNORECASE)
+      return provider.merge('key' => key) if regexp.match?(normalized_url)
+    end
+    nil
+  end
+  # Get all providers in the dictionary
+  #
+  # @return [Hash] the entire dictionary
+  def all_providers
+    dictionary
+  end
+  private
+  # Ленивый доступ к словарю - загружает его только при первом обращении
+  #
+  # @return [Hash] словарь провайдеров
+  def dictionary
+    @dictionary ||= load_dictionary(@dictionary_path)
+  end
+  # Load the dictionary from a YAML file
+  #
+  # @param path [String] path to the dictionary file
+  # @return [Hash] loaded dictionary
+  def load_dictionary(path)
+    return {} unless path && File.exist?(path)
+    YAML.load_file(path)
+  rescue StandardError => e
+    warn "Error loading dictionary: #{e.message}"
+    {}
+  end
+end

data/lib/dictionary_analyzer.rb ADDED Viewed

@@ -0,0 +1,85 @@
+require_relative 'dictionary'
+module SitedogParser
+  # Class for analyzing parsing results and finding candidates for dictionary additions
+  class DictionaryAnalyzer
+    # Finds all services that are potentially missing from the dictionary (have a name but no URL)
+    #
+    # @param parsed_data [Hash] data received from Parser
+    # @param dictionary_path [String, nil] path to the dictionary file (optional)
+    # @return [Hash] hash with candidates for dictionary addition
+    def self.find_dictionary_candidates(parsed_data, dictionary_path = nil)
+      candidates = {}
+      current_dictionary = dictionary_path ? Dictionary.new(dictionary_path) : Dictionary.new
+      parsed_data.each do |domain_name, services|
+        services.each do |service_type, service_list|
+          # Skip simple fields (non-array service lists)
+          next unless service_list.is_a?(Array)
+          service_list.each do |service|
+            # Candidates are services that:
+            # 1. Have a name
+            # 2. Have no URL
+            # 3. Have no children
+            # 4. Are not in the current dictionary
+            is_candidate = service.service &&                     # Has a name
+                          !service.url &&                         # No URL
+                          service.children.empty? &&              # No children
+                          current_dictionary.lookup(service.service).nil?  # Not in dictionary
+            if is_candidate
+              # Add to candidate list with context information
+              service_name = service.service.downcase
+              candidates[service_name] ||= {
+                name: service.service,
+                service_types: [],
+                domains: []
+              }
+              # Add service type and domain information
+              candidates[service_name][:service_types] << service_type unless candidates[service_name][:service_types].include?(service_type)
+              candidates[service_name][:domains] << domain_name.to_s unless candidates[service_name][:domains].include?(domain_name.to_s)
+            end
+          end
+        end
+      end
+      # Sort candidates by usage frequency
+      candidates.transform_values do |candidate|
+        candidate[:domains_count] = candidate[:domains].size
+        candidate[:types_count] = candidate[:service_types].size
+        candidate
+      end.sort_by { |_name, data| -data[:domains_count] }.to_h
+    end
+    # Analyzes data and outputs a report on dictionary candidates
+    #
+    # @param parsed_data [Hash] data received from Parser
+    # @param dictionary_path [String, nil] path to the dictionary file (optional)
+    # @return [String] report on dictionary candidates
+    def self.report(parsed_data, dictionary_path = nil)
+      candidates = find_dictionary_candidates(parsed_data, dictionary_path)
+      if candidates.empty?
+        return "All services have URLs or are already in the dictionary. No candidates for addition."
+      end
+      report = [
+        "DICTIONARY CANDIDATES REPORT",
+        "===========================",
+        "Found #{candidates.size} potential services to add to dictionary:",
+        ""
+      ]
+      candidates.each do |name, data|
+        report << "#{data[:name]}:"
+        report << "  - Used in #{data[:domains_count]} domain(s): #{data[:domains].join(', ')}"
+        report << "  - Used as service type(s): #{data[:service_types].map(&:to_s).join(', ')}"
+        report << ""
+      end
+      report.join("\n")
+    end
+  end
+end

data/lib/entities.rb ADDED Viewed

@@ -0,0 +1,3 @@
+# Core data structures for domain and hosting information
+Domain = Data.define(:domain, :dns, :registrar)
+Hosting = Data.define(:hosting, :cdn, :ssl, :repo)

data/lib/service.rb ADDED Viewed

@@ -0,0 +1,11 @@
+class Service < Data.define(:service, :url, :children)
+  def initialize(service:, url: nil, children: [])
+    raise ArgumentError, "Service cannot be empty" if service.nil? || service.empty?
+    service => String
+    url => String if url
+    children => Array if children
+    super
+  end
+end

data/lib/service_factory.rb ADDED Viewed

@@ -0,0 +1,181 @@
+require 'pry'
+require_relative 'url_checker'
+require_relative 'dictionary'
+require_relative 'service'
+# Factory for creating Service objects from different data formats
+class ServiceFactory
+  # Creates a Service object from various data formats
+  #
+  # @param data [String, Hash, Array] data for creating service
+  # @param service_type [Symbol] service type (used as fallback)
+  # @param dictionary_path [String, nil] path to the dictionary file (optional)
+  # @return [Service] created service object
+  def self.create(data, service_type = nil, dictionary_path = nil)
+    # Check for nil
+    return nil if data.nil?
+    slug = nil
+    url = nil
+    dictionary = Dictionary.new(dictionary_path)
+    case data
+    in String if UrlChecker.url_like?(data) # url
+      url = UrlChecker.normalize_url(data)
+      slug = dictionary.match(url)&.dig('name')
+      # If not found in dictionary and service_type exists, use it
+      if slug.nil? && service_type
+        slug = service_type.to_s
+      else
+        # Otherwise try to extract name from URL
+        slug = UrlChecker.extract_name(url) if slug.nil?
+      end
+      puts "url: #{slug} <- #{url}"
+    in String if !UrlChecker.url_like?(data) # slug
+      slug = data
+      url = dictionary.lookup(slug)&.dig('url')
+      puts "slug: #{slug} -> #{url}"
+    in { service: String => service_slug, url: String => service_url }
+      slug = service_slug.to_s.capitalize
+      url = service_url
+      puts "hash: #{slug} + #{url}"
+    in Hash
+      puts "hash: #{data}"
+      # Protection from nil values in key fields
+      if (data.key?(:service) || data.key?("service")) &&
+         (data[:service].nil? || data["service"].nil?)
+        return nil
+      end
+      # 1. Check if hash contains only URL-like strings (list of services)
+      if data.values.all? { |v| v.is_a?(String) && UrlChecker.url_like?(v) }
+        puts "hash with services: #{data.keys.join(', ')}"
+        # Create array of child services
+        children = []
+        data.each do |key, url_value|
+          service_name = key.to_s
+          child_service = Service.new(service: service_name.capitalize, url: url_value)
+          children << child_service
+        end
+        # Create parent service with child elements
+        if service_type && children.any?
+          return Service.new(service: service_type.to_s, children: children)
+        elsif children.size == 1
+          # If only one service and no service_type, return it directly
+          return children.first
+        end
+      end
+      # 2. If hash contains service and url (possibly with additional fields)
+      if (data.key?(:service) || data.key?("service")) &&
+         (data.key?(:url) || data.key?("url"))
+        service_key = data.key?(:service) ? :service : "service"
+        service_name = data[service_key].to_s
+        url_key = data.key?(:url) ? :url : "url"
+        url_value = data[url_key]
+        return Service.new(service: service_name.capitalize, url: url_value)
+      end
+      # 3. Process nested hashes
+      children = []
+      data.each do |key, value|
+        child = nil
+        if value.is_a?(Hash)
+          # 3.1 If value has a hash with service and url
+          if (value.key?(:service) || value.key?("service")) &&
+             (value.key?(:url) || value.key?("url"))
+            service_key = value.key?(:service) ? :service : "service"
+            service_name = value[service_key].to_s
+            url_key = value.key?(:url) ? :url : "url"
+            url_value = value[url_key]
+            child = Service.new(service: service_name.capitalize, url: url_value)
+          # 3.2 If value has hash with only URL-like values
+          elsif value.values.all? { |v| v.is_a?(String) && UrlChecker.url_like?(v) }
+            child_children = []
+            value.each do |sub_key, url_value|
+              child_children << Service.new(service: sub_key.to_s.capitalize, url: url_value)
+            end
+            child = Service.new(service: key.to_s, children: child_children)
+          # 3.3 Recursively process other cases
+          else
+            child = create(value, key, dictionary_path)
+            # If nothing worked, create an empty service with key name
+            if child.nil? && value.is_a?(Hash)
+              child_children = []
+              has_urls = false
+              value.each do |sub_key, sub_value|
+                if sub_value.is_a?(String) && UrlChecker.url_like?(sub_value)
+                  has_urls = true
+                  child_children << Service.new(service: sub_key.to_s.capitalize, url: sub_value)
+                end
+              end
+              child = Service.new(service: key.to_s, children: child_children) if has_urls
+            end
+          end
+        # 3.4 If the value is a URL string
+        elsif value.is_a?(String) && UrlChecker.url_like?(value)
+          child = Service.new(service: key.to_s.capitalize, url: value)
+        end
+        children << child if child
+      end
+      # Create parent service if there are child elements
+      if children.any? && service_type
+        return Service.new(service: service_type.to_s, children: children)
+      elsif children.size == 1 && !service_type
+        # If only one child element and no service_type, return it
+        return children.first
+      elsif children.any?
+        # If there are child elements but no service_type, create a service with unknown name
+        return Service.new(service: "Unknown", children: children)
+      end
+    in Array
+      puts "array: #{data}"
+      # Create services from array elements
+      children = data.map { |item| create(item, service_type, dictionary_path) }.compact
+      # If there are child services, create a parent service with them
+      if children.any? && service_type
+        return Service.new(service: service_type.to_s, children: children)
+      elsif children.size == 1
+        # If only one child service, return it
+        return children.first
+      end
+      # If no child services or no name for parent service,
+      # return nil
+      return nil
+    else
+      # Handle values that don't match any pattern
+      return nil
+    end
+    # Create service with collected data
+    if slug
+      Service.new(service: slug, url: url)
+    else
+      nil
+    end
+  rescue => e
+    puts "Error creating service: #{e.message}"
+    puts "Data: #{data.inspect}"
+    return nil
+  end
+end

data/lib/sitedog_parser/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module SitedogParser
+  VERSION = "0.1.1"
+end

data/lib/sitedog_parser.rb ADDED Viewed

@@ -0,0 +1,108 @@
+require "sitedog_parser/version"
+require 'yaml'
+require_relative "service"
+require_relative "dictionary"
+require_relative "url_checker"
+require_relative "service_factory"
+module SitedogParser
+  class Error < StandardError; end
+  # Main parser class that provides a high-level interface to the library
+  class Parser
+    # By default, fields that should not be processed as services
+    DEFAULT_SIMPLE_FIELDS = [:project, :role, :environment, :bought_at]
+    # Parse a YAML file and convert it to structured Ruby objects
+    #
+    # @param file_path [String] path to the YAML file
+    # @param symbolize_names [Boolean] whether to symbolize keys in the YAML file
+    # @param simple_fields [Array<Symbol>] fields that should remain as simple strings without service wrapping
+    # @param dictionary_path [String, nil] path to the dictionary file (optional)
+    # @return [Hash] hash containing parsed services by type and domain
+    def self.parse_file(file_path, symbolize_names: true, simple_fields: DEFAULT_SIMPLE_FIELDS, dictionary_path: nil)
+      yaml = YAML.load_file(file_path, symbolize_names: symbolize_names)
+      parse(yaml, simple_fields: simple_fields, dictionary_path: dictionary_path)
+    end
+    # Parse YAML data and convert it to structured Ruby objects
+    #
+    # @param yaml [Hash] YAML data as a hash
+    # @param simple_fields [Array<Symbol>] fields that should remain as simple strings without service wrapping
+    # @param dictionary_path [String, nil] path to the dictionary file (optional)
+    # @return [Hash] hash containing parsed services by type and domain
+    def self.parse(yaml, simple_fields: DEFAULT_SIMPLE_FIELDS, dictionary_path: nil)
+      result = {}
+      yaml.each do |domain_name, items|
+        services = {}
+        # Process each service type and its data
+        items.each do |service_type, data|
+          # Проверяем, является ли это поле "простым" (не сервисом)
+          if simple_fields.include?(service_type)
+            # Для простых полей просто сохраняем значение без оборачивания в сервис
+            services[service_type] = data
+          else
+            # Для обычных полей создаем сервис
+            service = ServiceFactory.create(data, service_type, dictionary_path)
+            if service
+              services[service_type] ||= []
+              services[service_type] << service
+            end
+          end
+        end
+        # Create a structure with all the services
+        result[domain_name] = services
+      end
+      result
+    end
+    # Get all services of a specific type from parsed data
+    #
+    # @param parsed_data [Hash] data returned by parse or parse_file
+    # @param service_type [Symbol] type of service to extract
+    # @return [Array] array of services of the specified type
+    def self.get_services_by_type(parsed_data, service_type)
+      result = []
+      parsed_data.each do |_domain_name, services|
+        if services[service_type] && services[service_type].is_a?(Array)
+          result.concat(services[service_type])
+        end
+      end
+      result
+    end
+    # Get domain names from parsed data
+    #
+    # @param parsed_data [Hash] data returned by parse or parse_file
+    # @return [Array] array of domain names
+    def self.get_domain_names(parsed_data)
+      parsed_data.keys
+    end
+    # Get domains with a specific simple field value
+    #
+    # @param parsed_data [Hash] data returned by parse or parse_file
+    # @param field [Symbol] simple field to filter by
+    # @param value [String] value to match
+    # @return [Array] array of domain names that have the specified field value
+    def self.get_domains_by_field_value(parsed_data, field, value)
+      result = []
+      parsed_data.each do |domain_name, services|
+        if services[field] == value
+          result << domain_name
+        end
+      end
+      result
+    end
+  end
+end

data/lib/url_checker.rb ADDED Viewed

@@ -0,0 +1,87 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Module for working with URL-like strings
+#
+# Usage:
+#   require_relative 'lib/url_checker'
+#
+#   UrlChecker.url_like?("example.com") # => true
+#   UrlChecker.url_like?("http://example.com") # => true
+#   UrlChecker.url_like?("not-a-url") # => false
+#
+#   UrlChecker.normalize_url("example.com") # => "https://example.com"
+module UrlChecker
+  # Checks if a string looks like a URL
+  #
+  # @param string [String] string to check
+  # @return [Boolean] true if the string looks like a URL, false otherwise
+  def self.url_like?(string)
+    return false unless string.is_a?(String)
+    # Regular expression for checking URL-like strings
+    # Supports various protocols and formats:
+    # - standard URLs (with http, https, ftp, etc.)
+    # - Git URLs (format git@hostname:user/repo.git)
+    if string.match?(/^git@[a-zA-Z0-9][-a-zA-Z0-9.]+\.[a-zA-Z]{2,}:[a-zA-Z0-9\/_.-]+\.git$/)
+      return true
+    end
+    # Check for standard URLs
+    pattern = /^((?:https?|ftp|sftp|ftps|ssh|git|ws|wss):\/\/)?[a-zA-Z0-9][-a-zA-Z0-9.]+\.[a-zA-Z]{2,}(:[0-9]+)?(\/[-a-zA-Z0-9%_.~#+]*)*(\?[-a-zA-Z0-9%_&=.~#+]*)?(#[-a-zA-Z0-9%_&=.~#+\/]*)?$/
+    !!string.match(pattern)
+  end
+  # Normalizes a URL by adding a protocol if missing
+  #
+  # @param url [String] URL to normalize
+  # @param default_protocol [String] protocol to prepend if none exists (default: "https")
+  # @return [String, nil] normalized URL, or nil if input is not a valid URL
+  def self.normalize_url(url, default_protocol = "https")
+    return nil unless url_like?(url)
+    # Git URLs remain as is
+    return url if url.start_with?("git@")
+    # Return as is if already has a protocol
+    return url if url.match?(/^[a-zA-Z]+:\/\//)
+    # Add default protocol
+    "#{default_protocol}://#{url}"
+  end
+  # Extracts the service name from a URL
+  #
+  # @param url [String] URL to extract the name from
+  # @return [String, nil] name of the service or nil if could not be extracted
+  def self.extract_name(url)
+    return nil unless url_like?(url)
+    # Remove protocol and www prefix if present
+    domain = url.gsub(%r{^(?:https?://)?(?:www\.)?}, "")
+    # Extract domain from URL by removing everything after first / or : or ? or #
+    domain = domain.split(/[:\/?#]/).first
+    # Extract the service name (usually the second-level domain)
+    parts = domain.split(".")
+    # If domain has enough parts (e.g., example.com, sub.example.com)
+    if parts.size >= 2
+      # For most domains, the second-to-last part is the name
+      # e.g., example.com -> example, sub.example.com -> example
+      service_name = parts[-2]
+      # Special cases for country-specific TLDs with subdomains
+      # e.g., example.co.uk -> example
+      if parts.size >= 3 && ["co", "com", "org", "net", "ac"].include?(parts[-2])
+        service_name = parts[-3]
+      end
+      return service_name
+    end
+    nil
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,144 @@
+--- !ruby/object:Gem::Specification
+name: sitedog_parser
+version: !ruby/object:Gem::Version
+  version: 0.1.1
+platform: ruby
+authors:
+- Ivan Nemytchenko
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2025-04-15 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
+- !ruby/object:Gem::Dependency
+  name: minitest
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '5.0'
+- !ruby/object:Gem::Dependency
+  name: pry
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.14.1
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.14.1
+- !ruby/object:Gem::Dependency
+  name: bump
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.10.0
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.10.0
+- !ruby/object:Gem::Dependency
+  name: thor
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.2'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.2'
+description: A library for parsing and classifying web services, hosting, and domain
+  data from YAML files into structured Ruby objects
+email:
+- nemytchenko@gmail.com
+executables:
+- analyze_dictionary
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- README.md
+- bin/analyze_dictionary
+- lib/data_structures.rb
+- lib/dictionary.rb
+- lib/dictionary_analyzer.rb
+- lib/entities.rb
+- lib/service.rb
+- lib/service_factory.rb
+- lib/sitedog_parser.rb
+- lib/sitedog_parser/version.rb
+- lib/url_checker.rb
+homepage: https://github.com/inem/sitedog-parser
+licenses:
+- MIT
+metadata:
+  homepage_uri: https://github.com/inem/sitedog-parser
+  source_code_uri: https://github.com/inem/sitedog-parser
+  changelog_uri: https://github.com/inem/sitedog-parser/blob/master/CHANGELOG.md
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 3.3.7
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.5.22
+signing_key:
+specification_version: 4
+summary: Parser for converting YAML format into Ruby data structures
+test_files: []