html2rss 0.12.0 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +38 -10
  3. data/html2rss.gemspec +1 -0
  4. data/lib/html2rss/attribute_post_processors/base.rb +9 -6
  5. data/lib/html2rss/attribute_post_processors/gsub.rb +2 -2
  6. data/lib/html2rss/attribute_post_processors/html_to_markdown.rb +2 -2
  7. data/lib/html2rss/attribute_post_processors/markdown_to_html.rb +2 -2
  8. data/lib/html2rss/attribute_post_processors/parse_time.rb +2 -2
  9. data/lib/html2rss/attribute_post_processors/parse_uri.rb +2 -2
  10. data/lib/html2rss/attribute_post_processors/sanitize_html.rb +13 -2
  11. data/lib/html2rss/attribute_post_processors/substring.rb +3 -3
  12. data/lib/html2rss/attribute_post_processors/template.rb +4 -4
  13. data/lib/html2rss/auto_source/article.rb +95 -0
  14. data/lib/html2rss/auto_source/channel.rb +79 -0
  15. data/lib/html2rss/auto_source/cleanup.rb +76 -0
  16. data/lib/html2rss/auto_source/reducer.rb +48 -0
  17. data/lib/html2rss/auto_source/rss_builder.rb +68 -0
  18. data/lib/html2rss/auto_source/scraper/schema/base.rb +61 -0
  19. data/lib/html2rss/auto_source/scraper/schema.rb +122 -0
  20. data/lib/html2rss/auto_source/scraper/semantic_html/extractor.rb +123 -0
  21. data/lib/html2rss/auto_source/scraper/semantic_html/image.rb +54 -0
  22. data/lib/html2rss/auto_source/scraper/semantic_html.rb +118 -0
  23. data/lib/html2rss/auto_source/scraper.rb +33 -0
  24. data/lib/html2rss/auto_source.rb +77 -0
  25. data/lib/html2rss/cli.rb +10 -0
  26. data/lib/html2rss/config/channel.rb +4 -2
  27. data/lib/html2rss/config/selectors.rb +2 -2
  28. data/lib/html2rss/item.rb +8 -2
  29. data/lib/html2rss/utils.rb +5 -10
  30. data/lib/html2rss/version.rb +1 -1
  31. data/lib/html2rss.rb +21 -0
  32. metadata +29 -3
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ebe536d8051a64c6e2adf9fa8e1d9d1f9fa3743541c44ca85022d0603f9032b2
4
- data.tar.gz: 7b3aaa213aaf6a37fb6e94fa72c9936ffd2391322297553b253b097edea300cc
3
+ metadata.gz: 7a2bf557dd65533533e07b4581e195f2d2b32ff906831526a4d7aed27a558d71
4
+ data.tar.gz: f42e5f03649a08219d310a2545413c371f851530c4d323fd68ef783b4b3b5e13
5
5
  SHA512:
6
- metadata.gz: '03985002d050b996c1dc315cbe8e3fc79b6619447a048ad3d2dca86f792eab5c2356716cf6198a24efc61de7e7ddceba2780da49c3e68a3c9efe895eb7cf0cf1'
7
- data.tar.gz: 8315473528f46a5ba28297af296b879a66ac00f86ba9eb117b4e6c9ec61c285e4090cfd999ff712368f5b988b1cbda460e268aa3ea8928912bcdb1960ae25a4a
6
+ metadata.gz: 724a1fa8ab15ae140278eb9b055f22e7aad12e94627795f7a2f13c78f5421607e39d6ba040821b4c47b69f963cc0180bf8e964ff0b896403cb6305ed1d67dbb5
7
+ data.tar.gz: a06c2e16b0b51c6b6d2184430efc2a4e8b2812fee413163aa2991567e7608141f1c18189fdded58c8c3383940c4790478cd631abc6a1470ad648b2030fdefaab
data/README.md CHANGED
@@ -26,26 +26,40 @@ You can also install it as a dependency in your Ruby project:
26
26
 
27
27
  ## Generating a feed on the CLI
28
28
 
29
- Create a file called `my_config_file.yml` with this example content:
29
+ ### using automatic scraping
30
+
31
+ html2rss offers an automatic scrapting feature. Try it with:
32
+
33
+ `html2rss auto https://unmatchedstyle.com/`
34
+
35
+ ### creating a feed config file and using it
36
+
37
+ If the results are not to your satisfaction, you can create a feed config file.
38
+
39
+ Create a file called `my_config_file.yml` with this sample content:
30
40
 
31
41
  ```yml
32
42
  channel:
33
- url: https://stackoverflow.com/questions
43
+ url: https://unmatchedstyle.com
34
44
  selectors:
35
45
  items:
36
- selector: "#hot-network-questions > ul > li"
46
+ selector: "article[id^='post-']"
37
47
  title:
38
- selector: a
48
+ selector: h2
39
49
  link:
40
50
  selector: a
41
51
  extractor: href
52
+ description:
53
+ selector: ".post-content"
54
+ post_process:
55
+ - name: sanitize_html
42
56
  ```
43
57
 
44
- Build the RSS with: `html2rss feed ./my_config_file.yml`.
58
+ Build the feed from this config with: `html2rss feed ./my_config_file.yml`.
45
59
 
46
60
  ## Generating a feed with Ruby
47
61
 
48
- Here's a minimal working example in Ruby:
62
+ Here's a minimal working example using Ruby:
49
63
 
50
64
  ```ruby
51
65
  require 'html2rss'
@@ -481,7 +495,7 @@ feeds:
481
495
 
482
496
  Your feed configs go below `feeds`. Everything else is part of the global config.
483
497
 
484
- Find a full example of a `feeds.yml` at [`spec/feeds.test.yml`](https://github.com/html2rss/html2rss/blob/master/spec/feeds.test.yml).
498
+ Find a full example of a `feeds.yml` at [`spec/fixtures/feeds.test.yml`](https://github.com/html2rss/html2rss/blob/master/spec/fixtures/feeds.test.yml).
485
499
 
486
500
  Now you can build your feeds like this:
487
501
 
@@ -583,8 +597,22 @@ Recommended further readings:
583
597
 
584
598
  ### Contributing
585
599
 
586
- 1. Fork it ( <https://github.com/html2rss/html2rss/fork> )
600
+ Find ideas what to contribute in:
601
+
602
+ 1. <https://github.com/orgs/html2rss/discussions>
603
+ 2. the issues tracker: <https://github.com/html2rss/html2rss/issues>
604
+
605
+ #### Development Helpers
606
+
607
+ 1. `bin/setup`: installs dependencies and sets up the development environment.
608
+ 2. `bin/guard`: automatically runs rspec, rubocop and reek when a file changes.
609
+ 3. for a modern Ruby development experience: install [`ruby-lsp`](https://github.com/Shopify/ruby-lsp) and integrate it to your IDE:
610
+ a. [Ruby in Visual Studio Code](https://code.visualstudio.com/docs/languages/ruby)
611
+
612
+ #### How to submit changes
613
+
614
+ 1. Fork this repo ( <https://github.com/html2rss/html2rss/fork> )
587
615
  2. Create your feature branch (`git checkout -b my-new-feature`)
588
- 3. Commit your changes (`git commit -am 'Add some feature'`)
616
+ 3. Implement a commit your changes (`git commit -am 'feat: add XYZ'`)
589
617
  4. Push to the branch (`git push origin my-new-feature`)
590
- 5. Create a new Pull Request
618
+ 5. Create a new Pull Request using the Github web UI
data/html2rss.gemspec CHANGED
@@ -38,6 +38,7 @@ Gem::Specification.new do |spec|
38
38
  spec.add_dependency 'kramdown'
39
39
  spec.add_dependency 'mime-types', '> 3.0'
40
40
  spec.add_dependency 'nokogiri', '>= 1.10', '< 2.0'
41
+ spec.add_dependency 'parallel'
41
42
  spec.add_dependency 'regexp_parser'
42
43
  spec.add_dependency 'reverse_markdown', '~> 2.0'
43
44
  spec.add_dependency 'rss'
@@ -26,17 +26,20 @@ module Html2rss
26
26
  # @param value [Object] the value to check
27
27
  # @param types [Array<Class>, Class] the expected type(s)
28
28
  # @param name [String] the name of the option being checked
29
+ # @param context [Item::Context] the context
29
30
  # @raise [InvalidType] if the value is not of the expected type(s)
30
- def self.assert_type(value, types = [], name)
31
+ def self.assert_type(value, types = [], name, context:)
31
32
  types = [types] unless types.is_a?(Array)
32
33
 
33
34
  return if types.any? { |type| value.is_a?(type) }
34
35
 
35
- error_message_template = 'The type of `%s` must be %s, but is: %s'
36
- raise InvalidType, format(error_message_template, name, types.join(' or '), value.class), [], cause: nil
37
- end
36
+ options = context[:options] if context.is_a?(Hash)
37
+ options ||= { file: File.basename(caller_locations(1, 1).first.absolute_path) }
38
38
 
39
- # private_class_method :expect_options, :assert_type
39
+ raise InvalidType, format('The type of `%<name>s` must be %<types>s, but is: %<type>s in: %<options>s',
40
+ name:, types: types.join(' or '), type: value.class, options: options.inspect),
41
+ [], cause: nil
42
+ end
40
43
 
41
44
  ##
42
45
  # This method validates the arguments passed to the post processor. Must be implemented by subclasses.
@@ -51,7 +54,7 @@ module Html2rss
51
54
  def initialize(value, context)
52
55
  klass = self.class
53
56
  # TODO: get rid of Hash
54
- klass.assert_type(context, [Item::Context, Hash], 'context')
57
+ klass.assert_type(context, [Item::Context, Hash], 'context', context:)
55
58
  klass.validate_args!(value, context)
56
59
 
57
60
  @value = value
@@ -27,9 +27,9 @@ module Html2rss
27
27
  # See the doc on [String#gsub](https://ruby-doc.org/core/String.html#method-i-gsub) for more info.
28
28
  class Gsub < Base
29
29
  def self.validate_args!(value, context)
30
- assert_type value, String, :value
30
+ assert_type value, String, :value, context:
31
31
  expect_options(%i[replacement pattern], context)
32
- assert_type context.dig(:options, :replacement), [String, Hash], :replacement
32
+ assert_type context.dig(:options, :replacement), [String, Hash], :replacement, context:
33
33
  end
34
34
 
35
35
  ##
@@ -27,8 +27,8 @@ module Html2rss
27
27
  # Would return:
28
28
  # 'Lorem **ipsum** dolor'
29
29
  class HtmlToMarkdown < Base
30
- def self.validate_args!(value, _context)
31
- assert_type value, String, :value
30
+ def self.validate_args!(value, context)
31
+ assert_type value, String, :value, context:
32
32
  end
33
33
 
34
34
  ##
@@ -33,8 +33,8 @@ module Html2rss
33
33
  #
34
34
  # <p>Price: 12.34</p>
35
35
  class MarkdownToHtml < Base
36
- def self.validate_args!(value, _context)
37
- assert_type value, String, :value
36
+ def self.validate_args!(value, context)
37
+ assert_type value, String, :value, context:
38
38
  end
39
39
 
40
40
  ##
@@ -27,8 +27,8 @@ module Html2rss
27
27
  # It uses `Time.parse`.
28
28
  class ParseTime < Base
29
29
  def self.validate_args!(value, context)
30
- assert_type value, String, :value
31
- assert_type context[:config].time_zone, String, :time_zone
30
+ assert_type(value, String, :value, context:)
31
+ assert_type(context[:config].time_zone, String, :time_zone, context:)
32
32
  end
33
33
 
34
34
  ##
@@ -25,8 +25,8 @@ module Html2rss
25
25
  def self.validate_args!(value, context)
26
26
  url_types = [String, URI::HTTP, Addressable::URI].freeze
27
27
 
28
- assert_type(value, url_types, :value)
29
- assert_type(context.config.url, url_types, :url)
28
+ assert_type(value, url_types, :value, context:)
29
+ assert_type(context.config.url, url_types, :url, context:)
30
30
 
31
31
  raise ArgumentError, 'The `value` option is missing or empty.' if value.to_s.empty?
32
32
  end
@@ -39,8 +39,19 @@ module Html2rss
39
39
  # Would return:
40
40
  # '<p>Lorem <b>ipsum</b> dolor ...</p>'
41
41
  class SanitizeHtml < Base
42
- def self.validate_args!(value, _context)
43
- assert_type value, String, :value
42
+ def self.validate_args!(value, context)
43
+ assert_type value, String, :value, context:
44
+ end
45
+
46
+ ##
47
+ # Shorthand method to get the sanitized HTML.
48
+ # @param html [String]
49
+ # @param url [String, Addressable::URI]
50
+ def self.get(html, url)
51
+ raise ArgumentError, 'url must be a String or Addressable::URI' if url.to_s.empty?
52
+ return nil if html.to_s.empty?
53
+
54
+ new(html, { config: Config::Channel.new({ url: }) }).get
44
55
  end
45
56
 
46
57
  ##
@@ -30,13 +30,13 @@ module Html2rss
30
30
  # 'bar'
31
31
  class Substring < Base
32
32
  def self.validate_args!(value, context)
33
- assert_type value, String, :value
33
+ assert_type value, String, :value, context:
34
34
 
35
35
  options = context[:options]
36
- assert_type options[:start], Integer, :start
36
+ assert_type options[:start], Integer, :start, context:
37
37
 
38
38
  end_index = options[:end]
39
- assert_type end_index, Integer, :end if end_index
39
+ assert_type(end_index, Integer, :end, context:) if end_index
40
40
  end
41
41
 
42
42
  ##
@@ -33,7 +33,7 @@ module Html2rss
33
33
  # 'Product (23,42€)'
34
34
  class Template < Base
35
35
  def self.validate_args!(value, context)
36
- assert_type value, String, :value
36
+ assert_type value, String, :value, context:
37
37
 
38
38
  string = context[:options]&.dig(:string).to_s
39
39
  raise InvalidType, 'The `string` template is absent.' if string.empty?
@@ -74,9 +74,9 @@ module Html2rss
74
74
  # @return [String]
75
75
  # @deprecated Use %<id>s formatting instead. Will be removed in version 1.0.0. See README / Dynamic parameters.
76
76
  def format_string_with_methods
77
- warn '[DEPRECATION] This method of using params is deprecated and \
78
- support for it will be removed in version 1.0.0.\
79
- Please use dynamic parameters (i.e. %<id>s, see README.md) instead.'
77
+ Log.warn '[DEPRECATION] This method of using params is deprecated and \
78
+ support for it will be removed in version 1.0.0.\
79
+ Please use dynamic parameters (i.e. %<id>s, see README.md) instead.'
80
80
 
81
81
  string % methods
82
82
  end
@@ -0,0 +1,95 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'zlib'
4
+ require 'sanitize'
5
+
6
+ module Html2rss
7
+ class AutoSource
8
+ ##
9
+ # Article is a simple data object representing an article extracted from a page.
10
+ # It is enumerable and responds to all keys specified in PROVIDED_KEYS.
11
+ class Article
12
+ include Enumerable
13
+ include Comparable
14
+
15
+ PROVIDED_KEYS = %i[id title description url image guid published_at scraper].freeze
16
+
17
+ # @param options [Hash<Symbol, String>]
18
+ def initialize(**options)
19
+ @to_h = {}
20
+ options.each_pair { |key, value| @to_h[key] = value.freeze if value }
21
+ @to_h.freeze
22
+
23
+ return unless (unknown_keys = options.keys - PROVIDED_KEYS).any?
24
+
25
+ Log.warn "Article: unknown keys found: #{unknown_keys.join(', ')}"
26
+ end
27
+
28
+ # Checks if the article is valid based on the presence of URL, ID, and either title or description.
29
+ # @return [Boolean] True if the article is valid, otherwise false.
30
+ def valid?
31
+ !url.to_s.empty? && (!title.to_s.empty? || !description.to_s.empty?) && !id.to_s.empty?
32
+ end
33
+
34
+ # @yield [key, value]
35
+ # @return [Enumerator] if no block is given
36
+ def each
37
+ return enum_for(:each) unless block_given?
38
+
39
+ PROVIDED_KEYS.each { |key| yield(key, public_send(key)) }
40
+ end
41
+
42
+ def id
43
+ @to_h[:id]
44
+ end
45
+
46
+ def title
47
+ @to_h[:title]
48
+ end
49
+
50
+ def description
51
+ return @description if defined?(@description)
52
+
53
+ return if url.to_s.empty? || @to_h[:description].to_s.empty?
54
+
55
+ @description ||= Html2rss::AttributePostProcessors::SanitizeHtml.get(@to_h[:description], url)
56
+ end
57
+
58
+ # @return [Addressable::URI, nil]
59
+ def url
60
+ @url ||= Html2rss::Utils.sanitize_url(@to_h[:url])
61
+ end
62
+
63
+ # @return [Addressable::URI, nil]
64
+ def image
65
+ @image ||= Html2rss::Utils.sanitize_url(@to_h[:image])
66
+ end
67
+
68
+ # Generates a unique identifier based on the URL and ID using CRC32.
69
+ # @return [String]
70
+ def guid
71
+ @guid ||= Zlib.crc32([url, id].join('#!/')).to_s(36).encode('utf-8')
72
+ end
73
+
74
+ # Parses and returns the published_at time.
75
+ # @return [Time, nil]
76
+ def published_at
77
+ return if (string = @to_h[:published_at].to_s).strip.empty?
78
+
79
+ @published_at ||= Time.parse(string)
80
+ rescue ArgumentError
81
+ nil
82
+ end
83
+
84
+ def scraper
85
+ @to_h[:scraper]
86
+ end
87
+
88
+ def <=>(other)
89
+ return nil unless other.is_a?(Article)
90
+
91
+ 0 if other.all? { |key, value| value == public_send(key) ? public_send(key) <=> value : false }
92
+ end
93
+ end
94
+ end
95
+ end
@@ -0,0 +1,79 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Html2rss
4
+ class AutoSource
5
+ ##
6
+ # Extracts channel information from
7
+ # 1. the HTML document's <head>.
8
+ # 2. the HTTP response
9
+ class Channel
10
+ ##
11
+ #
12
+ # @param parsed_body [Nokogiri::HTML::Document] The parsed HTML document.
13
+ # @param response [Faraday::Response] The URL of the HTML document.
14
+ def initialize(parsed_body, url:, response:, articles: [])
15
+ @parsed_body = parsed_body
16
+ @url = url
17
+ @response = response
18
+ @articles = articles
19
+ end
20
+
21
+ def url = extract_url
22
+ def title = extract_title
23
+ def language = extract_language
24
+ def description = extract_description
25
+ def image = extract_image
26
+ def ttl = extract_ttl
27
+ def last_build_date = response.headers['last-modified']
28
+
29
+ def generator
30
+ "html2rss V. #{::Html2rss::VERSION} (using auto_source scrapers: #{scraper_counts})"
31
+ end
32
+
33
+ private
34
+
35
+ attr_reader :parsed_body, :response
36
+
37
+ def extract_url
38
+ @url.normalize.to_s
39
+ end
40
+
41
+ def extract_title
42
+ parsed_body.at_css('head > title')&.text
43
+ end
44
+
45
+ def extract_language
46
+ return parsed_body['lang'] if parsed_body.name == 'html' && parsed_body['lang']
47
+
48
+ parsed_body.at_css('[lang]')&.[]('lang')
49
+ end
50
+
51
+ def extract_description
52
+ parsed_body.at_css('meta[name="description"]')&.[]('content') || ''
53
+ end
54
+
55
+ def extract_image
56
+ url = parsed_body.at_css('meta[property="og:image"]')&.[]('content')
57
+ Html2rss::Utils.sanitize_url(url) if url
58
+ end
59
+
60
+ def extract_ttl
61
+ ttl = response.headers['cache-control']&.match(/max-age=(\d+)/)&.[](1)
62
+ return unless ttl
63
+
64
+ ttl.to_i.fdiv(60).ceil
65
+ end
66
+
67
+ def scraper_counts
68
+ scraper_counts = +''
69
+
70
+ @articles.each_with_object(Hash.new(0)) { |article, counts| counts[article.scraper] += 1 }
71
+ .each do |klass, count|
72
+ scraper_counts.concat("[#{klass.to_s.gsub('Html2rss::AutoSource::Scraper::', '')}=#{count}]")
73
+ end
74
+
75
+ scraper_counts
76
+ end
77
+ end
78
+ end
79
+ end
@@ -0,0 +1,76 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Html2rss
4
+ class AutoSource
5
+ ##
6
+ # Cleanup is responsible for cleaning up the extracted articles.
7
+ # :reek:MissingSafeMethod { enabled: false }
8
+ # It applies various strategies to filter and refine the article list.
9
+ class Cleanup
10
+ class << self
11
+ def call(articles, url:, keep_different_domain: false)
12
+ Log.debug "Cleanup: start with #{articles.size} articles"
13
+
14
+ articles.select!(&:valid?)
15
+
16
+ remove_short!(articles, :title)
17
+
18
+ deduplicate_by!(articles, :url)
19
+ deduplicate_by!(articles, :title)
20
+
21
+ keep_only_http_urls!(articles)
22
+ reject_different_domain!(articles, url) unless keep_different_domain
23
+
24
+ Log.debug "Cleanup: end with #{articles.size} articles"
25
+ articles
26
+ end
27
+
28
+ private
29
+
30
+ ##
31
+ # Removes articles with short values for a given key.
32
+ #
33
+ # @param articles [Array<Article>] The list of articles to process.
34
+ # @param key [Symbol] The key to check for short values.
35
+ # @param min_words [Integer] The minimum number of words required.
36
+ def remove_short!(articles, key = :title, min_words: 2)
37
+ articles.reject! do |article|
38
+ value = article.public_send(key)
39
+ value.nil? || value.to_s.split.size < min_words
40
+ end
41
+ end
42
+
43
+ ##
44
+ # Deduplicates articles by a given key.
45
+ #
46
+ # @param articles [Array<Article>] The list of articles to process.
47
+ # @param key [Symbol] The key to deduplicate by.
48
+ def deduplicate_by!(articles, key)
49
+ seen = {}
50
+ articles.reject! do |article|
51
+ value = article.public_send(key)
52
+ value.nil? || seen.key?(value).tap { seen[value] = true }
53
+ end
54
+ end
55
+
56
+ ##
57
+ # Keeps only articles with HTTP or HTTPS URLs.
58
+ #
59
+ # @param articles [Array<Article>] The list of articles to process.
60
+ def keep_only_http_urls!(articles)
61
+ articles.select! { |article| %w[http https].include?(article.url&.scheme) }
62
+ end
63
+
64
+ ##
65
+ # Rejects articles that have a URL not on the same domain as the source.
66
+ #
67
+ # @param articles [Array<Article>] The list of articles to process.
68
+ # @param base_url [Addressable::URI] The source URL to compare against.
69
+ def reject_different_domain!(articles, base_url)
70
+ base_host = base_url.host
71
+ articles.select! { |article| article.url&.host == base_host }
72
+ end
73
+ end
74
+ end
75
+ end
76
+ end
@@ -0,0 +1,48 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Html2rss
4
+ class AutoSource
5
+ ##
6
+ # Reducer is responsible for reducing the list of articles.
7
+ # It keeps only the longest attributes of articles with the same URL.
8
+ # It also filters out invalid articles.
9
+ class Reducer
10
+ class << self
11
+ def call(articles, **_options)
12
+ Log.debug "Reducer: inited with #{articles.size} articles"
13
+
14
+ reduce_by_keeping_longest_values(articles, keep: [:scraper]) { |article| article.url&.path }
15
+ end
16
+
17
+ private
18
+
19
+ # @param articles [Array<Article>]
20
+ # @return [Array<Article>] reduced articles
21
+ def reduce_by_keeping_longest_values(articles, keep:, &)
22
+ grouped_by_block = articles.group_by(&)
23
+ grouped_by_block.each_with_object([]) do |(_key, grouped_articles), result|
24
+ memo_object = {}
25
+ grouped_articles.each do |article_hash|
26
+ keep_longest_values(memo_object, article_hash, keep:)
27
+ end
28
+
29
+ result << Article.new(**memo_object)
30
+ end
31
+ end
32
+
33
+ def keep_longest_values(memo_object, article_hash, keep:)
34
+ article_hash.each do |key, value|
35
+ next if value.eql?(memo_object[key])
36
+
37
+ if keep.include?(key)
38
+ memo_object[key] ||= []
39
+ memo_object[key] << value
40
+ elsif value && value.to_s.size > memo_object[key].to_s.size
41
+ memo_object[key] = value
42
+ end
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'rss'
4
+
5
+ module Html2rss
6
+ class AutoSource
7
+ ##
8
+ # Converts the autosourced channel and articles to an RSS feed.
9
+ class RssBuilder
10
+ def self.add_guid(article, maker)
11
+ maker.guid.tap do |guid|
12
+ guid.content = article.guid
13
+ guid.isPermaLink = false
14
+ end
15
+ end
16
+
17
+ def self.add_image(article, maker)
18
+ url = article.image || return
19
+
20
+ maker.enclosure.tap do |enclosure|
21
+ enclosure.url = url
22
+ enclosure.type = Html2rss::Utils.guess_content_type_from_url(url)
23
+ enclosure.length = 0
24
+ end
25
+ end
26
+
27
+ def initialize(channel:, articles:)
28
+ @channel = channel
29
+ @articles = articles
30
+ end
31
+
32
+ def call
33
+ RSS::Maker.make('2.0') do |maker|
34
+ make_channel(maker.channel)
35
+ make_items(maker)
36
+ end
37
+ end
38
+
39
+ private
40
+
41
+ attr_reader :channel, :articles
42
+
43
+ def make_channel(maker)
44
+ %i[language title description ttl].each do |key|
45
+ maker.public_send(:"#{key}=", channel.public_send(key))
46
+ end
47
+
48
+ maker.link = channel.url
49
+ maker.generator = channel.generator
50
+ maker.updated = channel.last_build_date
51
+ end
52
+
53
+ def make_items(maker)
54
+ articles.each do |article|
55
+ maker.items.new_item do |item_maker|
56
+ RssBuilder.add_guid(article, item_maker)
57
+ RssBuilder.add_image(article, item_maker)
58
+
59
+ item_maker.title = article.title
60
+ item_maker.description = article.description
61
+ item_maker.pubDate = article.published_at
62
+ item_maker.link = article.url
63
+ end
64
+ end
65
+ end
66
+ end
67
+ end
68
+ end