rss_feed_plus 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.rspec +3 -0
- data/.rubocop.yml +11 -0
- data/CHANGELOG.md +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +82 -0
- data/Rakefile +6 -0
- data/lib/rss_feed/dynamic_object.rb +58 -0
- data/lib/rss_feed/feed/base.rb +41 -0
- data/lib/rss_feed/feed/channel.rb +47 -0
- data/lib/rss_feed/feed/item.rb +42 -0
- data/lib/rss_feed/feed/namespace.rb +75 -0
- data/lib/rss_feed/object.rb +47 -0
- data/lib/rss_feed/parser.rb +209 -0
- data/lib/rss_feed/version.rb +3 -0
- data/lib/rss_feed.rb +5 -0
- data/parser.rb +29 -0
- data/rbs_collection.lock.yaml +84 -0
- data/rbs_collection.yaml +19 -0
- data/rss_example/news.rss +1843 -0
- data/rss_example/prog.xml +84 -0
- data/rss_feed.gemspec +47 -0
- data/sig/dynamic_object.rbs +9 -0
- data/sig/rss_feed/feed/base.rbs +18 -0
- data/sig/rss_feed/feed/channel.rbs +15 -0
- data/sig/rss_feed/feed/item.rbs +15 -0
- data/sig/rss_feed/feed/namespace.rbs +21 -0
- data/sig/rss_feed/parser.rbs +51 -0
- data/sig/rss_feeds.rbs +4 -0
- metadata +83 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: ec579256585d8f4a9ddf84497d214779d61c605cbf443bd1e55e4f094204cdc5
|
4
|
+
data.tar.gz: 0bf2827e8bfab1aa2806578303ff04a570991c97a61673dc0edd8c9bb82c57e1
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 566a7978173b0d7527a6413d80f6f0a2245633c51b7dc2950346e0b1b81ba10d465f906887f5e0b1fa5b06870c657e7426f47a3ac7e1bd989aa739daf3dfac7f
|
7
|
+
data.tar.gz: 1f0c46d97d326cbb2aa65dced535beee715b58b9570e7ced3f2d4cb36d5d876ac25c8d16cd17884b383171cdcd0dae34927c90d20e40cc1f1952b17c4bba3f23
|
data/.rspec
ADDED
data/.rubocop.yml
ADDED
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2024 talaatmagdyx
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,82 @@
|
|
1
|
+
# rss_feed_plus
|
2
|
+
|
3
|
+
## Introduction
|
4
|
+
|
5
|
+
**rss_feed_plus** is your go-to Ruby gem for effortlessly fetching and parsing RSS feeds. Whether you're building a news aggregator, content management system, or simply want to integrate RSS feeds into your application, **rss_feed_plus** simplifies the process, allowing you to easily retrieve and process RSS feed data from various sources.
|
6
|
+
|
7
|
+
## Features
|
8
|
+
|
9
|
+
- **Effortless Parsing**: Fetch and parse RSS feeds with ease.
|
10
|
+
- **Customization: Tailor** parsing to fit your needs with customizable XML and URI parsers and timeout duration.
|
11
|
+
- **Seamless Integration**: Integrate with Ruby applications smoothly.
|
12
|
+
|
13
|
+
## Installation
|
14
|
+
|
15
|
+
Getting started with **rss_feed_plus** is quick and easy. Simply add the gem to your application's Gemfile:
|
16
|
+
|
17
|
+
```ruby
|
18
|
+
gem 'rss_feed_plus'
|
19
|
+
```
|
20
|
+
|
21
|
+
Then, install the gem by running:
|
22
|
+
|
23
|
+
```bash
|
24
|
+
bundle install
|
25
|
+
```
|
26
|
+
|
27
|
+
Alternatively, you can install the gem directly using RubyGems:
|
28
|
+
|
29
|
+
```bash
|
30
|
+
gem install rss_feed_plus
|
31
|
+
```
|
32
|
+
|
33
|
+
## Usage
|
34
|
+
|
35
|
+
Here's a basic example of how to use **rss_feed_plus** to fetch and parse RSS feeds:
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
require 'rss_feed'
|
39
|
+
require 'nokogiri'
|
40
|
+
|
41
|
+
# Define your custom options
|
42
|
+
feed_urls = 'https://www.ruby-lang.org/en/feeds/news.rss'
|
43
|
+
xml_parser = Nokogiri
|
44
|
+
uri_parser = URI
|
45
|
+
timeout = 10
|
46
|
+
|
47
|
+
# Initialize the Parser class with custom options
|
48
|
+
parser = RssFeed::Parser.new(feed_urls, xml_parser: xml_parser, uri_parser: uri_parser, timeout: timeout)
|
49
|
+
# or
|
50
|
+
parser = RssFeed::Parser.new(feed_urls)
|
51
|
+
# Parse the RSS feeds
|
52
|
+
parsed_data = parser.parse_as_object
|
53
|
+
|
54
|
+
# Process the parsed data
|
55
|
+
puts parsed_data.inspect
|
56
|
+
```
|
57
|
+
|
58
|
+
## Customization
|
59
|
+
|
60
|
+
**rss_feed_plus** allows you to tailor the parsing process to fit your needs. Customize the XML parser, URI parser, and timeout duration according to your requirements.
|
61
|
+
|
62
|
+
## Contributing
|
63
|
+
|
64
|
+
Contributions to **rss_feed_plus** are welcome! If you encounter any issues, have feature requests, or would like to contribute enhancements, please feel free to open an issue or submit a pull request on [GitHub](https://github.com/talaatmagdyx/rss_feed_plus).
|
65
|
+
|
66
|
+
Before contributing, please review the [Contributing Guidelines](https://github.com/talaatmagdyx/rss_feed_plus/blob/master/.github/CONTRIBUTING.md) and adhere to the [Code of Conduct](https://github.com/talaatmagdyx/rss_feed_plus/blob/master/.github/CODE_OF_CONDUCT.md).
|
67
|
+
|
68
|
+
## Reporting Bugs / Feature Requests
|
69
|
+
|
70
|
+
If you encounter any bugs or have suggestions for new features, please [open an issue on GitHub](https://github.com/talaatmagdyx/rss_feed_plus/issues). Your feedback is valuable and helps improve the quality of the gem.
|
71
|
+
|
72
|
+
## License
|
73
|
+
|
74
|
+
**rss_feed_plus** is released under the [MIT License](https://opensource.org/licenses/MIT). You are free to use, modify, and distribute the gem according to the terms of the license.
|
75
|
+
|
76
|
+
## Code of Conduct
|
77
|
+
|
78
|
+
Please review and adhere to the [Code of Conduct](https://github.com/talaatmagdyx/rss_feed_plus/blob/master/.github/CODE_OF_CONDUCT.md) when interacting with the **rss_feed_plus** project. We strive to maintain a welcoming and inclusive community for all contributors and users.
|
79
|
+
|
80
|
+
---
|
81
|
+
|
82
|
+
Experience the simplicity of RSS feed integration with **rss_feed_plus**. Happy coding!
|
data/Rakefile
ADDED
@@ -0,0 +1,58 @@
|
|
1
|
+
class DynamicObject
|
2
|
+
## Example usage:
|
3
|
+
# data = {
|
4
|
+
# name: "John",
|
5
|
+
# age: 30,
|
6
|
+
# address: {
|
7
|
+
# city: "New York",
|
8
|
+
# state: "NY"
|
9
|
+
# },
|
10
|
+
# hobbies: ["reading", "hiking"]
|
11
|
+
# }
|
12
|
+
#
|
13
|
+
# dynamic_object = DynamicObject.new(data)
|
14
|
+
#
|
15
|
+
# puts dynamic_object.name # Output: John
|
16
|
+
# puts dynamic_object.age # Output: 30
|
17
|
+
# puts dynamic_object.address.city # Output: New York
|
18
|
+
# puts dynamic_object.address.state # Output: NY
|
19
|
+
# puts dynamic_object.hobbies # Output: ["reading", "hiking"]
|
20
|
+
|
21
|
+
# A class for initializing objects dynamically based on given data.
|
22
|
+
# Initializes a new instance of DynamicInitializer.
|
23
|
+
#
|
24
|
+
# @param data [Hash] The data used to initialize the object.
|
25
|
+
# @return [DynamicInitializer] A new instance of DynamicInitializer.
|
26
|
+
def initialize(data)
|
27
|
+
data.each do |key, value|
|
28
|
+
key = key.to_s
|
29
|
+
set_instance_variable(key, value)
|
30
|
+
define_singleton_method(key.tr(':', '_')) { instance_variable_get("@#{key.tr(':', '_')}") }
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
private
|
35
|
+
|
36
|
+
# Sets an instance variable based on the provided key and value.
|
37
|
+
#
|
38
|
+
# @param key [String] The key for the instance variable.
|
39
|
+
# @param value [Object] The value to be assigned to the instance variable.
|
40
|
+
# @return [void]
|
41
|
+
def set_instance_variable(key, value)
|
42
|
+
if value.is_a?(Hash)
|
43
|
+
instance_variable_set("@#{key.tr(':', '_')}", DynamicObject.new(value))
|
44
|
+
elsif value.is_a?(Array)
|
45
|
+
instance_variable_set("@#{key}", process_array(value))
|
46
|
+
else
|
47
|
+
instance_variable_set("@#{key}", value)
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
# Processes an array, creating DynamicObject instances if elements are hashes.
|
52
|
+
#
|
53
|
+
# @param array [Array] The array to be processed.
|
54
|
+
# @return [Array] The processed array.
|
55
|
+
def process_array(array)
|
56
|
+
array.map { |v| v.is_a?(Hash) ? DynamicObject.new(v) : v }
|
57
|
+
end
|
58
|
+
end
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module RssFeed
|
2
|
+
module Feed
|
3
|
+
# The Base class serves as the base class for all feed parsers.
|
4
|
+
class Base
|
5
|
+
attr_reader :document
|
6
|
+
|
7
|
+
# Initializes a new Base instance.
|
8
|
+
#
|
9
|
+
# @param document [Nokogiri::XML::Document] The parsed XML document.
|
10
|
+
def initialize(document)
|
11
|
+
@document = document
|
12
|
+
end
|
13
|
+
|
14
|
+
# This method should be implemented by subclasses to define the specific parsing logic.
|
15
|
+
#
|
16
|
+
# @abstract
|
17
|
+
def parser
|
18
|
+
raise NotImplementedError
|
19
|
+
end
|
20
|
+
|
21
|
+
private
|
22
|
+
|
23
|
+
# Detects the type of the feed based on the root element of the XML document.
|
24
|
+
#
|
25
|
+
# @return [String] The name of the root element.
|
26
|
+
def detect_feed_type
|
27
|
+
document.root.name
|
28
|
+
end
|
29
|
+
|
30
|
+
# Executes the appropriate parsing method based on the detected feed type.
|
31
|
+
#
|
32
|
+
# @raise [NotImplementedError] If the parsing method for the detected feed type is not implemented.
|
33
|
+
def execute_method
|
34
|
+
method_name = detect_feed_type
|
35
|
+
raise NotImplementedError unless respond_to?(method_name)
|
36
|
+
|
37
|
+
send(method_name)
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
require_relative 'base'
|
2
|
+
require_relative '../object'
|
3
|
+
|
4
|
+
module RssFeed
|
5
|
+
module Feed
|
6
|
+
# The Channel class represents a channel in an RSS feed.
|
7
|
+
class Channel < Base
|
8
|
+
# List of commonly used tags in an RSS channel.
|
9
|
+
TAGS = %w[
|
10
|
+
id
|
11
|
+
title subtitle link
|
12
|
+
description
|
13
|
+
author webMaster managingEditor contributor
|
14
|
+
pubDate lastBuildDate updated dc:date
|
15
|
+
generator language docs cloud
|
16
|
+
ttl skipHours skipDays
|
17
|
+
image logo icon rating
|
18
|
+
rights copyright
|
19
|
+
textInput feedburner:browserFriendly
|
20
|
+
itunes:author itunes:category
|
21
|
+
].freeze
|
22
|
+
|
23
|
+
# XPath expression for selecting the RSS channel.
|
24
|
+
def rss
|
25
|
+
return nil if document.blank?
|
26
|
+
|
27
|
+
'//channel'
|
28
|
+
end
|
29
|
+
|
30
|
+
# XPath expression for selecting the Atom feed.
|
31
|
+
def atom
|
32
|
+
return nil if document.blank?
|
33
|
+
|
34
|
+
'//feed'
|
35
|
+
end
|
36
|
+
|
37
|
+
alias feed atom
|
38
|
+
|
39
|
+
# Parses the RSS channel or Atom feed based on the detected feed type.
|
40
|
+
#
|
41
|
+
# @return [Nokogiri::XML::NodeSet, nil] The parsed channel or feed, or nil if not found.
|
42
|
+
def parse
|
43
|
+
document.xpath(execute_method)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require_relative 'base'
|
2
|
+
|
3
|
+
module RssFeed
|
4
|
+
module Feed
|
5
|
+
# The Item class represents an item in an RSS feed.
|
6
|
+
class Item < Base
|
7
|
+
# List of commonly used tags in an RSS item.
|
8
|
+
TAGS = %w[
|
9
|
+
id
|
10
|
+
title link link+alternate link+self link+edit link+replies
|
11
|
+
author contributor
|
12
|
+
description summary content content:encoded comments
|
13
|
+
pubDate published updated expirationDate modified dc:date
|
14
|
+
category guid
|
15
|
+
trackback:ping trackback:about
|
16
|
+
dc:creator dc:title dc:subject dc:rights dc:publisher
|
17
|
+
feedburner:origLink media:content media:thumbnail
|
18
|
+
media:title
|
19
|
+
media:credit
|
20
|
+
media:category
|
21
|
+
].freeze
|
22
|
+
|
23
|
+
# XPath expression for selecting the RSS item.
|
24
|
+
def rss
|
25
|
+
'//item'
|
26
|
+
end
|
27
|
+
|
28
|
+
# XPath expression for selecting the Atom entry.
|
29
|
+
def atom
|
30
|
+
'//entry'
|
31
|
+
end
|
32
|
+
|
33
|
+
alias feed atom
|
34
|
+
# Parses the RSS item or Atom entry based on the detected feed type.
|
35
|
+
#
|
36
|
+
# @return [Nokogiri::XML::NodeSet] The parsed item or entry.
|
37
|
+
def parse
|
38
|
+
document.xpath(execute_method)
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
@@ -0,0 +1,75 @@
|
|
1
|
+
require 'cgi'
|
2
|
+
require_relative '../object'
|
3
|
+
|
4
|
+
module RssFeed
|
5
|
+
module Feed
|
6
|
+
# The Namespace module provides utility methods for accessing and manipulating XML namespaces in RSS feeds.
|
7
|
+
class Namespace
|
8
|
+
# Mapping of namespace prefixes to their corresponding URIs.
|
9
|
+
NAMESPACES = {
|
10
|
+
'itunes' => 'http://www.itunes.com/dtds/podcast-1.0.dtd',
|
11
|
+
'dc' => 'http://purl.org/dc/elements/1.1/',
|
12
|
+
'feedburner' => 'http://rssnamespace.org/feedburner/ext/1.0',
|
13
|
+
'content' => 'http://purl.org/rss/1.0/modules/content/',
|
14
|
+
'trackback' => 'http://example.com/trackback',
|
15
|
+
'media' => 'http://search.yahoo.com/mrss/'
|
16
|
+
}.freeze
|
17
|
+
|
18
|
+
class << self
|
19
|
+
# Accesses the specified XML tag within the document with proper namespace handling.
|
20
|
+
#
|
21
|
+
# @param tag [String] The XML tag to access.
|
22
|
+
# @param doc [Nokogiri::XML::Document] The XML document.
|
23
|
+
# @return [Hash] The tag data including text, nested elements flag, nested attributes flag, and the document.
|
24
|
+
def access_tag(tag, doc)
|
25
|
+
doc = doc.xpath(tag, namespace(tag))
|
26
|
+
nested_elements = nested_elements?(doc)
|
27
|
+
{ text: doc.to_s, nested_elements: nested_elements, nested_attributes: nested_attributes?(doc), docs: doc }
|
28
|
+
end
|
29
|
+
|
30
|
+
# Resolves the namespace for the given XML tag.
|
31
|
+
#
|
32
|
+
# @param tag [String] The XML tag.
|
33
|
+
# @return [Hash] The namespace declaration.
|
34
|
+
def namespace(tag)
|
35
|
+
namespace_key = tag.split(':').first
|
36
|
+
{ namespace_key.to_s => NAMESPACES[namespace_key] }.compact
|
37
|
+
end
|
38
|
+
|
39
|
+
# Removes HTML tags from the given content.
|
40
|
+
#
|
41
|
+
# @param content [String] The content containing HTML tags.
|
42
|
+
# @return [String] The content without HTML tags.
|
43
|
+
def remove_html_tags(content)
|
44
|
+
if %r{([^-_.!~*'()a-zA-Z\d;/?:@&=+$,\[\]]%)}.match?(content)
|
45
|
+
CGI.unescape(content)
|
46
|
+
else
|
47
|
+
content
|
48
|
+
end.gsub(/(<!\[CDATA\[|\]\]>)/, '').strip.gsub(/<[^>]+>/, '')
|
49
|
+
end
|
50
|
+
|
51
|
+
# Checks if the XML node has nested elements.
|
52
|
+
#
|
53
|
+
# @param node [Nokogiri::XML::NodeSet] The XML node.
|
54
|
+
# @return [Boolean] Whether the node has nested elements.
|
55
|
+
def nested_elements?(node)
|
56
|
+
return false if node.blank? || node.to_s == 'NaN'
|
57
|
+
|
58
|
+
return true if node.children.any?(&:element?)
|
59
|
+
|
60
|
+
false
|
61
|
+
end
|
62
|
+
|
63
|
+
# Checks if the XML node has nested attributes.
|
64
|
+
#
|
65
|
+
# @param node [Nokogiri::XML::NodeSet] The XML node.
|
66
|
+
# @return [Boolean] Whether the node has nested attributes.
|
67
|
+
def nested_attributes?(node)
|
68
|
+
return false if node.blank? || node.to_s == 'NaN'
|
69
|
+
|
70
|
+
node.any? { |thumbnail| !thumbnail.attributes.empty? }
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
# Monkey-patching the Object class to add convenience methods for checking presence or absence of content.
|
2
|
+
class Object
|
3
|
+
# An object is present if it's not blank.
|
4
|
+
#
|
5
|
+
# @return [true, false].
|
6
|
+
def present?
|
7
|
+
!blank?
|
8
|
+
end
|
9
|
+
|
10
|
+
# An object is blank if it's false, empty, or a whitespace string.
|
11
|
+
# For example, +nil+, '', ' ', [], {}, and +false+ are all blank.
|
12
|
+
#
|
13
|
+
# This simplifies
|
14
|
+
#
|
15
|
+
# !address || address.empty?
|
16
|
+
#
|
17
|
+
# to
|
18
|
+
#
|
19
|
+
# address.blank?
|
20
|
+
#
|
21
|
+
# @return [true, false]
|
22
|
+
def blank?
|
23
|
+
# If the object responds to `empty?`, checks if it's empty or equals 'NaN'.
|
24
|
+
# Otherwise, checks if the object is nil.
|
25
|
+
respond_to?(:empty?) ? (empty? || self == 'NaN') : !self
|
26
|
+
end
|
27
|
+
|
28
|
+
# Returns the receiver if it's present otherwise returns +nil+.
|
29
|
+
# <tt>object.presence</tt> is equivalent to
|
30
|
+
#
|
31
|
+
# object.present? ? object : nil
|
32
|
+
#
|
33
|
+
# For example, something like
|
34
|
+
#
|
35
|
+
# state = params[:state] if params[:state].present?
|
36
|
+
# country = params[:country] if params[:country].present?
|
37
|
+
# region = state || country || 'US'
|
38
|
+
#
|
39
|
+
# becomes
|
40
|
+
#
|
41
|
+
# region = params[:state].presence || params[:country].presence || 'US'
|
42
|
+
#
|
43
|
+
# @return [Object]
|
44
|
+
def presence
|
45
|
+
self if present?
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,209 @@
|
|
1
|
+
require 'nokogiri'
|
2
|
+
require 'open-uri'
|
3
|
+
require 'socket'
|
4
|
+
|
5
|
+
require_relative 'feed/channel'
|
6
|
+
require_relative 'feed/item'
|
7
|
+
require_relative 'feed/namespace'
|
8
|
+
require_relative '../rss_feed/object'
|
9
|
+
require_relative '../rss_feed/dynamic_object'
|
10
|
+
|
11
|
+
module RssFeed
|
12
|
+
# The Parser class is responsible for parsing RSS feeds.
|
13
|
+
class Parser
|
14
|
+
attr_reader :feed_urls, :xml_parser, :uri_parser
|
15
|
+
|
16
|
+
# Initialize the Parser with a list of feed URLs.
|
17
|
+
#
|
18
|
+
# @param feed_urls String The URLs of the RSS feeds to parse.
|
19
|
+
def initialize(feed_urls, options = {})
|
20
|
+
@feed_urls = feed_urls
|
21
|
+
@xml_parser = options.fetch(:xml_parser, Nokogiri)
|
22
|
+
@uri_parser = options.fetch(:uri_parser, URI)
|
23
|
+
@timeout = options.fetch(:timeout, 10) # Default timeout: 10 seconds
|
24
|
+
@logger = options[:logger]
|
25
|
+
end
|
26
|
+
|
27
|
+
# Parse the RSS feeds and extract channel and item information.
|
28
|
+
#
|
29
|
+
# @return [Hash] The parsed channel and item information.
|
30
|
+
def parse
|
31
|
+
document = fetch_and_parse_xml(feed_urls)
|
32
|
+
channel = RssFeed::Feed::Channel.new(document)
|
33
|
+
channel_info = extract_channel_info(channel)
|
34
|
+
|
35
|
+
items = RssFeed::Feed::Item.new(document)
|
36
|
+
item_info = extract_item_info(items)
|
37
|
+
|
38
|
+
{ 'channel' => channel_info, 'items' => item_info }
|
39
|
+
end
|
40
|
+
|
41
|
+
def parse_as_object
|
42
|
+
DynamicObject.new(parse)
|
43
|
+
end
|
44
|
+
|
45
|
+
private
|
46
|
+
|
47
|
+
# Fetch and parse XML data from the given URL.
|
48
|
+
#
|
49
|
+
# @param url [String] The URL of the XML data.
|
50
|
+
# @return [Nokogiri::XML::Document] The parsed XML document.
|
51
|
+
# def fetch_and_parse_xml(url)
|
52
|
+
# rss_data = uri_parser.parse(url).open
|
53
|
+
# @xml_parser::XML(rss_data)
|
54
|
+
# rescue StandardError => e
|
55
|
+
# handle_error(e)
|
56
|
+
# raise RssFetchError, "Failed to fetch or parse XML: #{e.message}"
|
57
|
+
# end
|
58
|
+
def fetch_and_parse_xml(url)
|
59
|
+
rss_data = URI.parse(url).open(**uri_options)
|
60
|
+
@xml_parser::XML(rss_data)
|
61
|
+
rescue SocketError, URI::InvalidURIError => e
|
62
|
+
raise RssFetchError, "Failed to fetch or parse XML: #{e.message}"
|
63
|
+
rescue Timeout::Error => e
|
64
|
+
raise RssFetchError, "HTTP request timed out: #{e.message}"
|
65
|
+
end
|
66
|
+
|
67
|
+
def uri_options
|
68
|
+
{ open_timeout: @timeout, read_timeout: @timeout }.compact
|
69
|
+
end
|
70
|
+
|
71
|
+
# Extract channel information from the parsed XML document.
|
72
|
+
#
|
73
|
+
# @param channel [RssFeed::Feed::Channel] The channel object.
|
74
|
+
# @return [Hash] The extracted channel information.
|
75
|
+
def extract_channel_info(channel)
|
76
|
+
extract_info(channel, channel.parse)
|
77
|
+
end
|
78
|
+
|
79
|
+
# Extract item information from the parsed XML document.
|
80
|
+
#
|
81
|
+
# @param items [RssFeed::Feed::Item] The items object.
|
82
|
+
# @return [Array<Hash>] The extracted item information.
|
83
|
+
def extract_item_info(items)
|
84
|
+
items.parse.map { |item| extract_info(items, item) }
|
85
|
+
end
|
86
|
+
|
87
|
+
# Extract information from the XML document based on specified tags.
|
88
|
+
#
|
89
|
+
# @param feed [RssFeed::Feed::Channel/RssFeed::Feed::Item] The feed object.
|
90
|
+
# @param feed_parse [Hash] The parsed XML data.
|
91
|
+
# @return [Hash] The extracted information.
|
92
|
+
def extract_info(feed, feed_parse)
|
93
|
+
item_data = {}
|
94
|
+
|
95
|
+
feed.class::TAGS.each do |tag|
|
96
|
+
tag_data = extract_tag_data(tag, feed_parse)
|
97
|
+
next if skip_extraction?(tag_data)
|
98
|
+
|
99
|
+
items = extract_items(tag_data)
|
100
|
+
next if skip_items?(items, tag_data[:nested_attributes])
|
101
|
+
|
102
|
+
item_data[tag] = create_item_info(items, tag_data)
|
103
|
+
end
|
104
|
+
|
105
|
+
item_data
|
106
|
+
end
|
107
|
+
|
108
|
+
# Check if extraction of tag data should be skipped.
|
109
|
+
#
|
110
|
+
# @param tag_data [Hash] The tag data.
|
111
|
+
# @return [Boolean] True if extraction should be skipped, otherwise false.
|
112
|
+
def skip_extraction?(tag_data)
|
113
|
+
tag_data.values_at(:text, :nested_elements, :nested_attributes).all?(&:blank?)
|
114
|
+
end
|
115
|
+
|
116
|
+
# Check if extraction of items should be skipped.
|
117
|
+
#
|
118
|
+
# @param items [Object] The items to check.
|
119
|
+
# @param nested_attributes [Boolean] Whether the items have nested attributes.
|
120
|
+
# @return [Boolean] True if extraction should be skipped, otherwise false.
|
121
|
+
def skip_items?(items, nested_attributes)
|
122
|
+
items.blank? && nested_attributes.blank?
|
123
|
+
end
|
124
|
+
|
125
|
+
# Create item information hash.
|
126
|
+
#
|
127
|
+
# @param items [Object] The items data.
|
128
|
+
# @param tag_data [Hash] The tag data.
|
129
|
+
# @return [Hash] The item information.
|
130
|
+
def create_item_info(items, tag_data)
|
131
|
+
{ 'values' => items, 'attributes' => tag_data[:attributes] }.compact
|
132
|
+
end
|
133
|
+
|
134
|
+
# Extract tag data from the XML document.
|
135
|
+
#
|
136
|
+
# @param tag [String] The tag to extract.
|
137
|
+
# @param feed_parse [Hash] The parsed XML data.
|
138
|
+
# @return [Hash] The extracted tag data.
|
139
|
+
def extract_tag_data(tag, feed_parse)
|
140
|
+
value = RssFeed::Feed::Namespace.access_tag(tag, feed_parse)
|
141
|
+
value[:attributes] = extract_attributes(value[:docs]) if value[:nested_attributes]
|
142
|
+
value
|
143
|
+
end
|
144
|
+
|
145
|
+
# Extract items from the XML document.
|
146
|
+
#
|
147
|
+
# @param tag_data [Hash] The tag data.
|
148
|
+
# @return [Object] The extracted items.
|
149
|
+
def extract_items(tag_data)
|
150
|
+
tag_data[:nested_elements] ? extract_nested_data(tag_data[:docs]) : extract_clean_value(tag_data[:text])
|
151
|
+
end
|
152
|
+
|
153
|
+
# Add attributes to the item information hash.
|
154
|
+
#
|
155
|
+
# @param tag_item [Hash] The item information hash.
|
156
|
+
# @param tag_data [Hash] The tag data.
|
157
|
+
def add_attributes(tag_item, tag_data)
|
158
|
+
tag_item['attributes'] = tag_data[:attributes] if tag_data[:attributes].present?
|
159
|
+
end
|
160
|
+
|
161
|
+
# Extract clean value from the XML document.
|
162
|
+
#
|
163
|
+
# @param docs [Object] The XML document.
|
164
|
+
# @return [String] The extracted clean value.
|
165
|
+
def extract_clean_value(docs)
|
166
|
+
RssFeed::Feed::Namespace.remove_html_tags(docs).presence
|
167
|
+
end
|
168
|
+
|
169
|
+
# Extract nested data from the XML document.
|
170
|
+
#
|
171
|
+
# @param nodes [Object] The XML nodes.
|
172
|
+
# @return [Hash] The extracted nested data.
|
173
|
+
def extract_nested_data(nodes)
|
174
|
+
nodes.each_with_object({}) do |node, nested_data|
|
175
|
+
node.children.each do |child|
|
176
|
+
child_value = RssFeed::Feed::Namespace.remove_html_tags(child.text)
|
177
|
+
nested_data[child.name.to_sym] = child_value if child_value.present?
|
178
|
+
end
|
179
|
+
end
|
180
|
+
end
|
181
|
+
|
182
|
+
# Extract attributes from the XML document.
|
183
|
+
#
|
184
|
+
# @param node [Object] The XML node.
|
185
|
+
# @return [Array<Hash>] The extracted attributes.
|
186
|
+
def extract_attributes(node)
|
187
|
+
node.map do |thumbnail|
|
188
|
+
attributes_hash = {}
|
189
|
+
thumbnail.attributes.each do |name, value|
|
190
|
+
attributes_hash[name.to_s] = value.to_s
|
191
|
+
end
|
192
|
+
attributes_hash
|
193
|
+
end
|
194
|
+
end
|
195
|
+
|
196
|
+
def handle_error(error)
|
197
|
+
error_message = "Error occurred: #{error.message}"
|
198
|
+
# Fallback to puts if logger is not configured
|
199
|
+
@logger.present? ? @logger.error(error_message) : puts(error_message)
|
200
|
+
end
|
201
|
+
|
202
|
+
def configure_logger
|
203
|
+
@logger ||= Logger.new($stdout)
|
204
|
+
@logger.level = Logger::INFO
|
205
|
+
end
|
206
|
+
end
|
207
|
+
|
208
|
+
class RssFetchError < StandardError; end
|
209
|
+
end
|