rss_feed_plus 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +3 -0
- data/.rubocop.yml +11 -0
- data/CHANGELOG.md +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +82 -0
- data/Rakefile +6 -0
- data/lib/rss_feed/dynamic_object.rb +58 -0
- data/lib/rss_feed/feed/base.rb +41 -0
- data/lib/rss_feed/feed/channel.rb +47 -0
- data/lib/rss_feed/feed/item.rb +42 -0
- data/lib/rss_feed/feed/namespace.rb +75 -0
- data/lib/rss_feed/object.rb +47 -0
- data/lib/rss_feed/parser.rb +209 -0
- data/lib/rss_feed/version.rb +3 -0
- data/lib/rss_feed.rb +5 -0
- data/parser.rb +29 -0
- data/rbs_collection.lock.yaml +84 -0
- data/rbs_collection.yaml +19 -0
- data/rss_example/news.rss +1843 -0
- data/rss_example/prog.xml +84 -0
- data/rss_feed.gemspec +47 -0
- data/sig/dynamic_object.rbs +9 -0
- data/sig/rss_feed/feed/base.rbs +18 -0
- data/sig/rss_feed/feed/channel.rbs +15 -0
- data/sig/rss_feed/feed/item.rbs +15 -0
- data/sig/rss_feed/feed/namespace.rbs +21 -0
- data/sig/rss_feed/parser.rbs +51 -0
- data/sig/rss_feeds.rbs +4 -0
- metadata +83 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: ec579256585d8f4a9ddf84497d214779d61c605cbf443bd1e55e4f094204cdc5
|
4
|
+
data.tar.gz: 0bf2827e8bfab1aa2806578303ff04a570991c97a61673dc0edd8c9bb82c57e1
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 566a7978173b0d7527a6413d80f6f0a2245633c51b7dc2950346e0b1b81ba10d465f906887f5e0b1fa5b06870c657e7426f47a3ac7e1bd989aa739daf3dfac7f
|
7
|
+
data.tar.gz: 1f0c46d97d326cbb2aa65dced535beee715b58b9570e7ced3f2d4cb36d5d876ac25c8d16cd17884b383171cdcd0dae34927c90d20e40cc1f1952b17c4bba3f23
|
data/.rspec
ADDED
data/.rubocop.yml
ADDED
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2024 talaatmagdyx
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,82 @@
|
|
1
|
+
# rss_feed_plus
|
2
|
+
|
3
|
+
## Introduction
|
4
|
+
|
5
|
+
**rss_feed_plus** is your go-to Ruby gem for effortlessly fetching and parsing RSS feeds. Whether you're building a news aggregator, content management system, or simply want to integrate RSS feeds into your application, **rss_feed_plus** simplifies the process, allowing you to easily retrieve and process RSS feed data from various sources.
|
6
|
+
|
7
|
+
## Features
|
8
|
+
|
9
|
+
- **Effortless Parsing**: Fetch and parse RSS feeds with ease.
|
10
|
+
- **Customization: Tailor** parsing to fit your needs with customizable XML and URI parsers and timeout duration.
|
11
|
+
- **Seamless Integration**: Integrate with Ruby applications smoothly.
|
12
|
+
|
13
|
+
## Installation
|
14
|
+
|
15
|
+
Getting started with **rss_feed_plus** is quick and easy. Simply add the gem to your application's Gemfile:
|
16
|
+
|
17
|
+
```ruby
|
18
|
+
gem 'rss_feed_plus'
|
19
|
+
```
|
20
|
+
|
21
|
+
Then, install the gem by running:
|
22
|
+
|
23
|
+
```bash
|
24
|
+
bundle install
|
25
|
+
```
|
26
|
+
|
27
|
+
Alternatively, you can install the gem directly using RubyGems:
|
28
|
+
|
29
|
+
```bash
|
30
|
+
gem install rss_feed_plus
|
31
|
+
```
|
32
|
+
|
33
|
+
## Usage
|
34
|
+
|
35
|
+
Here's a basic example of how to use **rss_feed_plus** to fetch and parse RSS feeds:
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
require 'rss_feed'
|
39
|
+
require 'nokogiri'
|
40
|
+
|
41
|
+
# Define your custom options
|
42
|
+
feed_urls = 'https://www.ruby-lang.org/en/feeds/news.rss'
|
43
|
+
xml_parser = Nokogiri
|
44
|
+
uri_parser = URI
|
45
|
+
timeout = 10
|
46
|
+
|
47
|
+
# Initialize the Parser class with custom options
|
48
|
+
parser = RssFeed::Parser.new(feed_urls, xml_parser: xml_parser, uri_parser: uri_parser, timeout: timeout)
|
49
|
+
# or
|
50
|
+
parser = RssFeed::Parser.new(feed_urls)
|
51
|
+
# Parse the RSS feeds
|
52
|
+
parsed_data = parser.parse_as_object
|
53
|
+
|
54
|
+
# Process the parsed data
|
55
|
+
puts parsed_data.inspect
|
56
|
+
```
|
57
|
+
|
58
|
+
## Customization
|
59
|
+
|
60
|
+
**rss_feed_plus** allows you to tailor the parsing process to fit your needs. Customize the XML parser, URI parser, and timeout duration according to your requirements.
|
61
|
+
|
62
|
+
## Contributing
|
63
|
+
|
64
|
+
Contributions to **rss_feed_plus** are welcome! If you encounter any issues, have feature requests, or would like to contribute enhancements, please feel free to open an issue or submit a pull request on [GitHub](https://github.com/talaatmagdyx/rss_feed_plus).
|
65
|
+
|
66
|
+
Before contributing, please review the [Contributing Guidelines](https://github.com/talaatmagdyx/rss_feed_plus/blob/master/.github/CONTRIBUTING.md) and adhere to the [Code of Conduct](https://github.com/talaatmagdyx/rss_feed_plus/blob/master/.github/CODE_OF_CONDUCT.md).
|
67
|
+
|
68
|
+
## Reporting Bugs / Feature Requests
|
69
|
+
|
70
|
+
If you encounter any bugs or have suggestions for new features, please [open an issue on GitHub](https://github.com/talaatmagdyx/rss_feed_plus/issues). Your feedback is valuable and helps improve the quality of the gem.
|
71
|
+
|
72
|
+
## License
|
73
|
+
|
74
|
+
**rss_feed_plus** is released under the [MIT License](https://opensource.org/licenses/MIT). You are free to use, modify, and distribute the gem according to the terms of the license.
|
75
|
+
|
76
|
+
## Code of Conduct
|
77
|
+
|
78
|
+
Please review and adhere to the [Code of Conduct](https://github.com/talaatmagdyx/rss_feed_plus/blob/master/.github/CODE_OF_CONDUCT.md) when interacting with the **rss_feed_plus** project. We strive to maintain a welcoming and inclusive community for all contributors and users.
|
79
|
+
|
80
|
+
---
|
81
|
+
|
82
|
+
Experience the simplicity of RSS feed integration with **rss_feed_plus**. Happy coding!
|
data/Rakefile
ADDED
@@ -0,0 +1,58 @@
|
|
1
|
+
class DynamicObject
|
2
|
+
## Example usage:
|
3
|
+
# data = {
|
4
|
+
# name: "John",
|
5
|
+
# age: 30,
|
6
|
+
# address: {
|
7
|
+
# city: "New York",
|
8
|
+
# state: "NY"
|
9
|
+
# },
|
10
|
+
# hobbies: ["reading", "hiking"]
|
11
|
+
# }
|
12
|
+
#
|
13
|
+
# dynamic_object = DynamicObject.new(data)
|
14
|
+
#
|
15
|
+
# puts dynamic_object.name # Output: John
|
16
|
+
# puts dynamic_object.age # Output: 30
|
17
|
+
# puts dynamic_object.address.city # Output: New York
|
18
|
+
# puts dynamic_object.address.state # Output: NY
|
19
|
+
# puts dynamic_object.hobbies # Output: ["reading", "hiking"]
|
20
|
+
|
21
|
+
# A class for initializing objects dynamically based on given data.
|
22
|
+
# Initializes a new instance of DynamicInitializer.
|
23
|
+
#
|
24
|
+
# @param data [Hash] The data used to initialize the object.
|
25
|
+
# @return [DynamicInitializer] A new instance of DynamicInitializer.
|
26
|
+
def initialize(data)
|
27
|
+
data.each do |key, value|
|
28
|
+
key = key.to_s
|
29
|
+
set_instance_variable(key, value)
|
30
|
+
define_singleton_method(key.tr(':', '_')) { instance_variable_get("@#{key.tr(':', '_')}") }
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
private
|
35
|
+
|
36
|
+
# Sets an instance variable based on the provided key and value.
|
37
|
+
#
|
38
|
+
# @param key [String] The key for the instance variable.
|
39
|
+
# @param value [Object] The value to be assigned to the instance variable.
|
40
|
+
# @return [void]
|
41
|
+
def set_instance_variable(key, value)
|
42
|
+
if value.is_a?(Hash)
|
43
|
+
instance_variable_set("@#{key.tr(':', '_')}", DynamicObject.new(value))
|
44
|
+
elsif value.is_a?(Array)
|
45
|
+
instance_variable_set("@#{key}", process_array(value))
|
46
|
+
else
|
47
|
+
instance_variable_set("@#{key}", value)
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
# Processes an array, creating DynamicObject instances if elements are hashes.
|
52
|
+
#
|
53
|
+
# @param array [Array] The array to be processed.
|
54
|
+
# @return [Array] The processed array.
|
55
|
+
def process_array(array)
|
56
|
+
array.map { |v| v.is_a?(Hash) ? DynamicObject.new(v) : v }
|
57
|
+
end
|
58
|
+
end
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module RssFeed
|
2
|
+
module Feed
|
3
|
+
# The Base class serves as the base class for all feed parsers.
|
4
|
+
class Base
|
5
|
+
attr_reader :document
|
6
|
+
|
7
|
+
# Initializes a new Base instance.
|
8
|
+
#
|
9
|
+
# @param document [Nokogiri::XML::Document] The parsed XML document.
|
10
|
+
def initialize(document)
|
11
|
+
@document = document
|
12
|
+
end
|
13
|
+
|
14
|
+
# This method should be implemented by subclasses to define the specific parsing logic.
|
15
|
+
#
|
16
|
+
# @abstract
|
17
|
+
def parser
|
18
|
+
raise NotImplementedError
|
19
|
+
end
|
20
|
+
|
21
|
+
private
|
22
|
+
|
23
|
+
# Detects the type of the feed based on the root element of the XML document.
|
24
|
+
#
|
25
|
+
# @return [String] The name of the root element.
|
26
|
+
def detect_feed_type
|
27
|
+
document.root.name
|
28
|
+
end
|
29
|
+
|
30
|
+
# Executes the appropriate parsing method based on the detected feed type.
|
31
|
+
#
|
32
|
+
# @raise [NotImplementedError] If the parsing method for the detected feed type is not implemented.
|
33
|
+
def execute_method
|
34
|
+
method_name = detect_feed_type
|
35
|
+
raise NotImplementedError unless respond_to?(method_name)
|
36
|
+
|
37
|
+
send(method_name)
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
require_relative 'base'
|
2
|
+
require_relative '../object'
|
3
|
+
|
4
|
+
module RssFeed
|
5
|
+
module Feed
|
6
|
+
# The Channel class represents a channel in an RSS feed.
|
7
|
+
class Channel < Base
|
8
|
+
# List of commonly used tags in an RSS channel.
|
9
|
+
TAGS = %w[
|
10
|
+
id
|
11
|
+
title subtitle link
|
12
|
+
description
|
13
|
+
author webMaster managingEditor contributor
|
14
|
+
pubDate lastBuildDate updated dc:date
|
15
|
+
generator language docs cloud
|
16
|
+
ttl skipHours skipDays
|
17
|
+
image logo icon rating
|
18
|
+
rights copyright
|
19
|
+
textInput feedburner:browserFriendly
|
20
|
+
itunes:author itunes:category
|
21
|
+
].freeze
|
22
|
+
|
23
|
+
# XPath expression for selecting the RSS channel.
|
24
|
+
def rss
|
25
|
+
return nil if document.blank?
|
26
|
+
|
27
|
+
'//channel'
|
28
|
+
end
|
29
|
+
|
30
|
+
# XPath expression for selecting the Atom feed.
|
31
|
+
def atom
|
32
|
+
return nil if document.blank?
|
33
|
+
|
34
|
+
'//feed'
|
35
|
+
end
|
36
|
+
|
37
|
+
alias feed atom
|
38
|
+
|
39
|
+
# Parses the RSS channel or Atom feed based on the detected feed type.
|
40
|
+
#
|
41
|
+
# @return [Nokogiri::XML::NodeSet, nil] The parsed channel or feed, or nil if not found.
|
42
|
+
def parse
|
43
|
+
document.xpath(execute_method)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require_relative 'base'
|
2
|
+
|
3
|
+
module RssFeed
|
4
|
+
module Feed
|
5
|
+
# The Item class represents an item in an RSS feed.
|
6
|
+
class Item < Base
|
7
|
+
# List of commonly used tags in an RSS item.
|
8
|
+
TAGS = %w[
|
9
|
+
id
|
10
|
+
title link link+alternate link+self link+edit link+replies
|
11
|
+
author contributor
|
12
|
+
description summary content content:encoded comments
|
13
|
+
pubDate published updated expirationDate modified dc:date
|
14
|
+
category guid
|
15
|
+
trackback:ping trackback:about
|
16
|
+
dc:creator dc:title dc:subject dc:rights dc:publisher
|
17
|
+
feedburner:origLink media:content media:thumbnail
|
18
|
+
media:title
|
19
|
+
media:credit
|
20
|
+
media:category
|
21
|
+
].freeze
|
22
|
+
|
23
|
+
# XPath expression for selecting the RSS item.
|
24
|
+
def rss
|
25
|
+
'//item'
|
26
|
+
end
|
27
|
+
|
28
|
+
# XPath expression for selecting the Atom entry.
|
29
|
+
def atom
|
30
|
+
'//entry'
|
31
|
+
end
|
32
|
+
|
33
|
+
alias feed atom
|
34
|
+
# Parses the RSS item or Atom entry based on the detected feed type.
|
35
|
+
#
|
36
|
+
# @return [Nokogiri::XML::NodeSet] The parsed item or entry.
|
37
|
+
def parse
|
38
|
+
document.xpath(execute_method)
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
@@ -0,0 +1,75 @@
|
|
1
|
+
require 'cgi'
|
2
|
+
require_relative '../object'
|
3
|
+
|
4
|
+
module RssFeed
|
5
|
+
module Feed
|
6
|
+
# The Namespace module provides utility methods for accessing and manipulating XML namespaces in RSS feeds.
|
7
|
+
class Namespace
|
8
|
+
# Mapping of namespace prefixes to their corresponding URIs.
|
9
|
+
NAMESPACES = {
|
10
|
+
'itunes' => 'http://www.itunes.com/dtds/podcast-1.0.dtd',
|
11
|
+
'dc' => 'http://purl.org/dc/elements/1.1/',
|
12
|
+
'feedburner' => 'http://rssnamespace.org/feedburner/ext/1.0',
|
13
|
+
'content' => 'http://purl.org/rss/1.0/modules/content/',
|
14
|
+
'trackback' => 'http://example.com/trackback',
|
15
|
+
'media' => 'http://search.yahoo.com/mrss/'
|
16
|
+
}.freeze
|
17
|
+
|
18
|
+
class << self
|
19
|
+
# Accesses the specified XML tag within the document with proper namespace handling.
|
20
|
+
#
|
21
|
+
# @param tag [String] The XML tag to access.
|
22
|
+
# @param doc [Nokogiri::XML::Document] The XML document.
|
23
|
+
# @return [Hash] The tag data including text, nested elements flag, nested attributes flag, and the document.
|
24
|
+
def access_tag(tag, doc)
|
25
|
+
doc = doc.xpath(tag, namespace(tag))
|
26
|
+
nested_elements = nested_elements?(doc)
|
27
|
+
{ text: doc.to_s, nested_elements: nested_elements, nested_attributes: nested_attributes?(doc), docs: doc }
|
28
|
+
end
|
29
|
+
|
30
|
+
# Resolves the namespace for the given XML tag.
|
31
|
+
#
|
32
|
+
# @param tag [String] The XML tag.
|
33
|
+
# @return [Hash] The namespace declaration.
|
34
|
+
def namespace(tag)
|
35
|
+
namespace_key = tag.split(':').first
|
36
|
+
{ namespace_key.to_s => NAMESPACES[namespace_key] }.compact
|
37
|
+
end
|
38
|
+
|
39
|
+
# Removes HTML tags from the given content.
|
40
|
+
#
|
41
|
+
# @param content [String] The content containing HTML tags.
|
42
|
+
# @return [String] The content without HTML tags.
|
43
|
+
def remove_html_tags(content)
|
44
|
+
if %r{([^-_.!~*'()a-zA-Z\d;/?:@&=+$,\[\]]%)}.match?(content)
|
45
|
+
CGI.unescape(content)
|
46
|
+
else
|
47
|
+
content
|
48
|
+
end.gsub(/(<!\[CDATA\[|\]\]>)/, '').strip.gsub(/<[^>]+>/, '')
|
49
|
+
end
|
50
|
+
|
51
|
+
# Checks if the XML node has nested elements.
|
52
|
+
#
|
53
|
+
# @param node [Nokogiri::XML::NodeSet] The XML node.
|
54
|
+
# @return [Boolean] Whether the node has nested elements.
|
55
|
+
def nested_elements?(node)
|
56
|
+
return false if node.blank? || node.to_s == 'NaN'
|
57
|
+
|
58
|
+
return true if node.children.any?(&:element?)
|
59
|
+
|
60
|
+
false
|
61
|
+
end
|
62
|
+
|
63
|
+
# Checks if the XML node has nested attributes.
|
64
|
+
#
|
65
|
+
# @param node [Nokogiri::XML::NodeSet] The XML node.
|
66
|
+
# @return [Boolean] Whether the node has nested attributes.
|
67
|
+
def nested_attributes?(node)
|
68
|
+
return false if node.blank? || node.to_s == 'NaN'
|
69
|
+
|
70
|
+
node.any? { |thumbnail| !thumbnail.attributes.empty? }
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
# Monkey-patching the Object class to add convenience methods for checking presence or absence of content.
|
2
|
+
class Object
|
3
|
+
# An object is present if it's not blank.
|
4
|
+
#
|
5
|
+
# @return [true, false].
|
6
|
+
def present?
|
7
|
+
!blank?
|
8
|
+
end
|
9
|
+
|
10
|
+
# An object is blank if it's false, empty, or a whitespace string.
|
11
|
+
# For example, +nil+, '', ' ', [], {}, and +false+ are all blank.
|
12
|
+
#
|
13
|
+
# This simplifies
|
14
|
+
#
|
15
|
+
# !address || address.empty?
|
16
|
+
#
|
17
|
+
# to
|
18
|
+
#
|
19
|
+
# address.blank?
|
20
|
+
#
|
21
|
+
# @return [true, false]
|
22
|
+
def blank?
|
23
|
+
# If the object responds to `empty?`, checks if it's empty or equals 'NaN'.
|
24
|
+
# Otherwise, checks if the object is nil.
|
25
|
+
respond_to?(:empty?) ? (empty? || self == 'NaN') : !self
|
26
|
+
end
|
27
|
+
|
28
|
+
# Returns the receiver if it's present otherwise returns +nil+.
|
29
|
+
# <tt>object.presence</tt> is equivalent to
|
30
|
+
#
|
31
|
+
# object.present? ? object : nil
|
32
|
+
#
|
33
|
+
# For example, something like
|
34
|
+
#
|
35
|
+
# state = params[:state] if params[:state].present?
|
36
|
+
# country = params[:country] if params[:country].present?
|
37
|
+
# region = state || country || 'US'
|
38
|
+
#
|
39
|
+
# becomes
|
40
|
+
#
|
41
|
+
# region = params[:state].presence || params[:country].presence || 'US'
|
42
|
+
#
|
43
|
+
# @return [Object]
|
44
|
+
def presence
|
45
|
+
self if present?
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,209 @@
|
|
1
|
+
require 'nokogiri'
|
2
|
+
require 'open-uri'
|
3
|
+
require 'socket'
|
4
|
+
|
5
|
+
require_relative 'feed/channel'
|
6
|
+
require_relative 'feed/item'
|
7
|
+
require_relative 'feed/namespace'
|
8
|
+
require_relative '../rss_feed/object'
|
9
|
+
require_relative '../rss_feed/dynamic_object'
|
10
|
+
|
11
|
+
module RssFeed
|
12
|
+
# The Parser class is responsible for parsing RSS feeds.
|
13
|
+
class Parser
|
14
|
+
attr_reader :feed_urls, :xml_parser, :uri_parser
|
15
|
+
|
16
|
+
# Initialize the Parser with a list of feed URLs.
|
17
|
+
#
|
18
|
+
# @param feed_urls String The URLs of the RSS feeds to parse.
|
19
|
+
def initialize(feed_urls, options = {})
|
20
|
+
@feed_urls = feed_urls
|
21
|
+
@xml_parser = options.fetch(:xml_parser, Nokogiri)
|
22
|
+
@uri_parser = options.fetch(:uri_parser, URI)
|
23
|
+
@timeout = options.fetch(:timeout, 10) # Default timeout: 10 seconds
|
24
|
+
@logger = options[:logger]
|
25
|
+
end
|
26
|
+
|
27
|
+
# Parse the RSS feeds and extract channel and item information.
|
28
|
+
#
|
29
|
+
# @return [Hash] The parsed channel and item information.
|
30
|
+
def parse
|
31
|
+
document = fetch_and_parse_xml(feed_urls)
|
32
|
+
channel = RssFeed::Feed::Channel.new(document)
|
33
|
+
channel_info = extract_channel_info(channel)
|
34
|
+
|
35
|
+
items = RssFeed::Feed::Item.new(document)
|
36
|
+
item_info = extract_item_info(items)
|
37
|
+
|
38
|
+
{ 'channel' => channel_info, 'items' => item_info }
|
39
|
+
end
|
40
|
+
|
41
|
+
def parse_as_object
|
42
|
+
DynamicObject.new(parse)
|
43
|
+
end
|
44
|
+
|
45
|
+
private
|
46
|
+
|
47
|
+
# Fetch and parse XML data from the given URL.
|
48
|
+
#
|
49
|
+
# @param url [String] The URL of the XML data.
|
50
|
+
# @return [Nokogiri::XML::Document] The parsed XML document.
|
51
|
+
# def fetch_and_parse_xml(url)
|
52
|
+
# rss_data = uri_parser.parse(url).open
|
53
|
+
# @xml_parser::XML(rss_data)
|
54
|
+
# rescue StandardError => e
|
55
|
+
# handle_error(e)
|
56
|
+
# raise RssFetchError, "Failed to fetch or parse XML: #{e.message}"
|
57
|
+
# end
|
58
|
+
def fetch_and_parse_xml(url)
|
59
|
+
rss_data = URI.parse(url).open(**uri_options)
|
60
|
+
@xml_parser::XML(rss_data)
|
61
|
+
rescue SocketError, URI::InvalidURIError => e
|
62
|
+
raise RssFetchError, "Failed to fetch or parse XML: #{e.message}"
|
63
|
+
rescue Timeout::Error => e
|
64
|
+
raise RssFetchError, "HTTP request timed out: #{e.message}"
|
65
|
+
end
|
66
|
+
|
67
|
+
def uri_options
|
68
|
+
{ open_timeout: @timeout, read_timeout: @timeout }.compact
|
69
|
+
end
|
70
|
+
|
71
|
+
# Extract channel information from the parsed XML document.
|
72
|
+
#
|
73
|
+
# @param channel [RssFeed::Feed::Channel] The channel object.
|
74
|
+
# @return [Hash] The extracted channel information.
|
75
|
+
def extract_channel_info(channel)
|
76
|
+
extract_info(channel, channel.parse)
|
77
|
+
end
|
78
|
+
|
79
|
+
# Extract item information from the parsed XML document.
|
80
|
+
#
|
81
|
+
# @param items [RssFeed::Feed::Item] The items object.
|
82
|
+
# @return [Array<Hash>] The extracted item information.
|
83
|
+
def extract_item_info(items)
|
84
|
+
items.parse.map { |item| extract_info(items, item) }
|
85
|
+
end
|
86
|
+
|
87
|
+
# Extract information from the XML document based on specified tags.
|
88
|
+
#
|
89
|
+
# @param feed [RssFeed::Feed::Channel/RssFeed::Feed::Item] The feed object.
|
90
|
+
# @param feed_parse [Hash] The parsed XML data.
|
91
|
+
# @return [Hash] The extracted information.
|
92
|
+
def extract_info(feed, feed_parse)
|
93
|
+
item_data = {}
|
94
|
+
|
95
|
+
feed.class::TAGS.each do |tag|
|
96
|
+
tag_data = extract_tag_data(tag, feed_parse)
|
97
|
+
next if skip_extraction?(tag_data)
|
98
|
+
|
99
|
+
items = extract_items(tag_data)
|
100
|
+
next if skip_items?(items, tag_data[:nested_attributes])
|
101
|
+
|
102
|
+
item_data[tag] = create_item_info(items, tag_data)
|
103
|
+
end
|
104
|
+
|
105
|
+
item_data
|
106
|
+
end
|
107
|
+
|
108
|
+
# Check if extraction of tag data should be skipped.
|
109
|
+
#
|
110
|
+
# @param tag_data [Hash] The tag data.
|
111
|
+
# @return [Boolean] True if extraction should be skipped, otherwise false.
|
112
|
+
def skip_extraction?(tag_data)
|
113
|
+
tag_data.values_at(:text, :nested_elements, :nested_attributes).all?(&:blank?)
|
114
|
+
end
|
115
|
+
|
116
|
+
# Check if extraction of items should be skipped.
|
117
|
+
#
|
118
|
+
# @param items [Object] The items to check.
|
119
|
+
# @param nested_attributes [Boolean] Whether the items have nested attributes.
|
120
|
+
# @return [Boolean] True if extraction should be skipped, otherwise false.
|
121
|
+
def skip_items?(items, nested_attributes)
|
122
|
+
items.blank? && nested_attributes.blank?
|
123
|
+
end
|
124
|
+
|
125
|
+
# Create item information hash.
|
126
|
+
#
|
127
|
+
# @param items [Object] The items data.
|
128
|
+
# @param tag_data [Hash] The tag data.
|
129
|
+
# @return [Hash] The item information.
|
130
|
+
def create_item_info(items, tag_data)
|
131
|
+
{ 'values' => items, 'attributes' => tag_data[:attributes] }.compact
|
132
|
+
end
|
133
|
+
|
134
|
+
# Extract tag data from the XML document.
|
135
|
+
#
|
136
|
+
# @param tag [String] The tag to extract.
|
137
|
+
# @param feed_parse [Hash] The parsed XML data.
|
138
|
+
# @return [Hash] The extracted tag data.
|
139
|
+
def extract_tag_data(tag, feed_parse)
|
140
|
+
value = RssFeed::Feed::Namespace.access_tag(tag, feed_parse)
|
141
|
+
value[:attributes] = extract_attributes(value[:docs]) if value[:nested_attributes]
|
142
|
+
value
|
143
|
+
end
|
144
|
+
|
145
|
+
# Extract items from the XML document.
|
146
|
+
#
|
147
|
+
# @param tag_data [Hash] The tag data.
|
148
|
+
# @return [Object] The extracted items.
|
149
|
+
def extract_items(tag_data)
|
150
|
+
tag_data[:nested_elements] ? extract_nested_data(tag_data[:docs]) : extract_clean_value(tag_data[:text])
|
151
|
+
end
|
152
|
+
|
153
|
+
# Add attributes to the item information hash.
|
154
|
+
#
|
155
|
+
# @param tag_item [Hash] The item information hash.
|
156
|
+
# @param tag_data [Hash] The tag data.
|
157
|
+
def add_attributes(tag_item, tag_data)
|
158
|
+
tag_item['attributes'] = tag_data[:attributes] if tag_data[:attributes].present?
|
159
|
+
end
|
160
|
+
|
161
|
+
# Extract clean value from the XML document.
|
162
|
+
#
|
163
|
+
# @param docs [Object] The XML document.
|
164
|
+
# @return [String] The extracted clean value.
|
165
|
+
def extract_clean_value(docs)
|
166
|
+
RssFeed::Feed::Namespace.remove_html_tags(docs).presence
|
167
|
+
end
|
168
|
+
|
169
|
+
# Extract nested data from the XML document.
|
170
|
+
#
|
171
|
+
# @param nodes [Object] The XML nodes.
|
172
|
+
# @return [Hash] The extracted nested data.
|
173
|
+
def extract_nested_data(nodes)
|
174
|
+
nodes.each_with_object({}) do |node, nested_data|
|
175
|
+
node.children.each do |child|
|
176
|
+
child_value = RssFeed::Feed::Namespace.remove_html_tags(child.text)
|
177
|
+
nested_data[child.name.to_sym] = child_value if child_value.present?
|
178
|
+
end
|
179
|
+
end
|
180
|
+
end
|
181
|
+
|
182
|
+
# Extract attributes from the XML document.
|
183
|
+
#
|
184
|
+
# @param node [Object] The XML node.
|
185
|
+
# @return [Array<Hash>] The extracted attributes.
|
186
|
+
def extract_attributes(node)
|
187
|
+
node.map do |thumbnail|
|
188
|
+
attributes_hash = {}
|
189
|
+
thumbnail.attributes.each do |name, value|
|
190
|
+
attributes_hash[name.to_s] = value.to_s
|
191
|
+
end
|
192
|
+
attributes_hash
|
193
|
+
end
|
194
|
+
end
|
195
|
+
|
196
|
+
def handle_error(error)
|
197
|
+
error_message = "Error occurred: #{error.message}"
|
198
|
+
# Fallback to puts if logger is not configured
|
199
|
+
@logger.present? ? @logger.error(error_message) : puts(error_message)
|
200
|
+
end
|
201
|
+
|
202
|
+
def configure_logger
|
203
|
+
@logger ||= Logger.new($stdout)
|
204
|
+
@logger.level = Logger::INFO
|
205
|
+
end
|
206
|
+
end
|
207
|
+
|
208
|
+
class RssFetchError < StandardError; end
|
209
|
+
end
|