simple-rss 2.0.0 → 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +321 -0
- data/lib/simple-rss.rb +269 -1
- data/simple-rss.gemspec +5 -5
- data/test/base/enumerable_test.rb +101 -0
- data/test/base/fetch_test.rb +117 -0
- data/test/base/hash_xml_serialization_test.rb +142 -0
- data/test/base/json_serialization_test.rb +81 -0
- metadata +9 -5
- data/README.markdown +0 -47
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: f7e2a8e829ef0776194df8fd028266dbcecaeb0a5b7b1025aa75b739acbd6ad9
|
|
4
|
+
data.tar.gz: 664c6319db58dab8bf81e9f540afb6f1251997fa669b3e433a30748eb599ac16
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 3885af60f0b54553af1e3545a781f6f1179825386cc3edcf1575ed75bb79240eefc83998295d21e842287f1c949a0554918d39d1df1df924701d27f20b26c192
|
|
7
|
+
data.tar.gz: 81394817f11b09d433eb7b07031120661a7de18d678deff66fdbf4f331f8ec6483a155069f36539e4ba6122210ec3c58cf08b67f3a44d9afc582c0952861d073
|
data/README.md
ADDED
|
@@ -0,0 +1,321 @@
|
|
|
1
|
+
# SimpleRSS
|
|
2
|
+
|
|
3
|
+
[](https://badge.fury.io/rb/simple-rss)
|
|
4
|
+
[](https://github.com/cardmagic/simple-rss/actions/workflows/ruby.yml)
|
|
5
|
+
[](https://opensource.org/licenses/LGPL-3.0)
|
|
6
|
+
|
|
7
|
+
A simple, flexible, extensible, and liberal RSS and Atom reader for Ruby. Designed to be backwards compatible with Ruby's standard RSS parser while handling malformed feeds gracefully.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- Parses both RSS and Atom feeds
|
|
12
|
+
- Tolerant of malformed XML (regex-based parsing)
|
|
13
|
+
- Built-in URL fetching with conditional GET support (ETags, Last-Modified)
|
|
14
|
+
- JSON and XML serialization
|
|
15
|
+
- Extensible tag definitions
|
|
16
|
+
- Zero runtime dependencies
|
|
17
|
+
|
|
18
|
+
## What's New in 2.0
|
|
19
|
+
|
|
20
|
+
Version 2.0 is a major update with powerful new capabilities:
|
|
21
|
+
|
|
22
|
+
- **URL Fetching** - One-liner feed fetching with `SimpleRSS.fetch(url)`. Supports timeouts, custom headers, and automatic redirect following.
|
|
23
|
+
|
|
24
|
+
- **Conditional GET** - Bandwidth-efficient polling with ETag and Last-Modified support. Returns `nil` when feeds haven't changed (304 Not Modified).
|
|
25
|
+
|
|
26
|
+
- **JSON Serialization** - Export feeds with `to_json`, `to_hash`, and Rails-compatible `as_json`. Time objects serialize to ISO 8601.
|
|
27
|
+
|
|
28
|
+
- **XML Serialization** - Convert any parsed feed to clean RSS 2.0 or Atom XML with `to_xml(format: :rss2)` or `to_xml(format: :atom)`.
|
|
29
|
+
|
|
30
|
+
- **Array Tags** - Collect all occurrences of a tag (like multiple categories) with the `array_tags:` option.
|
|
31
|
+
|
|
32
|
+
- **Attribute Parsing** - Extract attributes from feed, item, and media tags using the `tag#attr` syntax.
|
|
33
|
+
|
|
34
|
+
- **UTF-8 Normalization** - All parsed content is automatically normalized to UTF-8 encoding.
|
|
35
|
+
|
|
36
|
+
- **Modern Ruby** - Full compatibility with Ruby 3.1 through 4.0, with RBS type annotations and Steep type checking.
|
|
37
|
+
|
|
38
|
+
- **Enumerable Support** - Iterate feeds naturally with `each`, `map`, `select`, and all Enumerable methods. Access items by index with `rss[0]` and get the latest items sorted by date with `latest(n)`.
|
|
39
|
+
|
|
40
|
+
## Installation
|
|
41
|
+
|
|
42
|
+
Add to your Gemfile:
|
|
43
|
+
|
|
44
|
+
```ruby
|
|
45
|
+
gem "simple-rss"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Or install directly:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
gem install simple-rss
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Quick Start
|
|
55
|
+
|
|
56
|
+
```ruby
|
|
57
|
+
require "simple-rss"
|
|
58
|
+
require "uri"
|
|
59
|
+
require "net/http"
|
|
60
|
+
|
|
61
|
+
# Parse from a string or IO object
|
|
62
|
+
xml = Net::HTTP.get(URI("https://example.com/feed.xml"))
|
|
63
|
+
rss = SimpleRSS.parse(xml)
|
|
64
|
+
|
|
65
|
+
rss.channel.title # => "Example Feed"
|
|
66
|
+
rss.items.first.title # => "First Post"
|
|
67
|
+
rss.items.first.pubDate # => 2024-01-15 12:00:00 -0500 (Time object)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Usage
|
|
71
|
+
|
|
72
|
+
### Fetching Feeds
|
|
73
|
+
|
|
74
|
+
SimpleRSS includes a built-in fetcher with conditional GET support for efficient polling:
|
|
75
|
+
|
|
76
|
+
```ruby
|
|
77
|
+
# Simple fetch
|
|
78
|
+
feed = SimpleRSS.fetch("https://example.com/feed.xml")
|
|
79
|
+
|
|
80
|
+
# With timeout
|
|
81
|
+
feed = SimpleRSS.fetch("https://example.com/feed.xml", timeout: 10)
|
|
82
|
+
|
|
83
|
+
# Conditional GET - only download if modified
|
|
84
|
+
feed = SimpleRSS.fetch("https://example.com/feed.xml")
|
|
85
|
+
# Store these for next request
|
|
86
|
+
etag = feed.etag
|
|
87
|
+
last_modified = feed.last_modified
|
|
88
|
+
|
|
89
|
+
# On subsequent requests, pass the stored values
|
|
90
|
+
feed = SimpleRSS.fetch(
|
|
91
|
+
"https://example.com/feed.xml",
|
|
92
|
+
etag:,
|
|
93
|
+
last_modified:
|
|
94
|
+
)
|
|
95
|
+
# Returns nil if feed hasn't changed (304 Not Modified)
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### Accessing Feed Data
|
|
99
|
+
|
|
100
|
+
SimpleRSS provides both RSS and Atom style accessors:
|
|
101
|
+
|
|
102
|
+
```ruby
|
|
103
|
+
feed = SimpleRSS.parse(xml)
|
|
104
|
+
|
|
105
|
+
# RSS style
|
|
106
|
+
feed.channel.title
|
|
107
|
+
feed.channel.link
|
|
108
|
+
feed.channel.description
|
|
109
|
+
feed.items
|
|
110
|
+
|
|
111
|
+
# Atom style (aliases)
|
|
112
|
+
feed.feed.title
|
|
113
|
+
feed.entries
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Item Attributes
|
|
117
|
+
|
|
118
|
+
Items support both hash and method access:
|
|
119
|
+
|
|
120
|
+
```ruby
|
|
121
|
+
item = feed.items.first
|
|
122
|
+
|
|
123
|
+
# Hash access
|
|
124
|
+
item[:title]
|
|
125
|
+
item[:link]
|
|
126
|
+
item[:pubDate]
|
|
127
|
+
|
|
128
|
+
# Method access
|
|
129
|
+
item.title
|
|
130
|
+
item.link
|
|
131
|
+
item.pubDate
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Date fields are automatically parsed into `Time` objects:
|
|
135
|
+
|
|
136
|
+
```ruby
|
|
137
|
+
item.pubDate.class # => Time
|
|
138
|
+
item.pubDate.year # => 2024
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Iterating with Enumerable
|
|
142
|
+
|
|
143
|
+
SimpleRSS includes `Enumerable`, so you can iterate feeds naturally:
|
|
144
|
+
|
|
145
|
+
```ruby
|
|
146
|
+
feed = SimpleRSS.parse(xml)
|
|
147
|
+
|
|
148
|
+
# Iterate over items
|
|
149
|
+
feed.each { |item| puts item.title }
|
|
150
|
+
|
|
151
|
+
# Use any Enumerable method
|
|
152
|
+
titles = feed.map { |item| item.title }
|
|
153
|
+
tech_posts = feed.select { |item| item.category == "tech" }
|
|
154
|
+
first_five = feed.first(5)
|
|
155
|
+
total = feed.count
|
|
156
|
+
|
|
157
|
+
# Access items by index
|
|
158
|
+
feed[0].title # first item
|
|
159
|
+
feed[-1].title # last item
|
|
160
|
+
|
|
161
|
+
# Get the n most recent items (sorted by pubDate or updated)
|
|
162
|
+
feed.latest(10)
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
### JSON Serialization
|
|
166
|
+
|
|
167
|
+
```ruby
|
|
168
|
+
feed = SimpleRSS.parse(xml)
|
|
169
|
+
|
|
170
|
+
# Get as hash
|
|
171
|
+
feed.to_hash
|
|
172
|
+
# => { title: "Feed Title", link: "...", items: [...] }
|
|
173
|
+
|
|
174
|
+
# Get as JSON string
|
|
175
|
+
feed.to_json
|
|
176
|
+
# => '{"title":"Feed Title","link":"...","items":[...]}'
|
|
177
|
+
|
|
178
|
+
# Works with Rails/ActiveSupport
|
|
179
|
+
feed.as_json
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### XML Serialization
|
|
183
|
+
|
|
184
|
+
Convert parsed feeds to standard RSS 2.0 or Atom format:
|
|
185
|
+
|
|
186
|
+
```ruby
|
|
187
|
+
feed = SimpleRSS.parse(xml)
|
|
188
|
+
|
|
189
|
+
# Convert to RSS 2.0
|
|
190
|
+
feed.to_xml(format: :rss2)
|
|
191
|
+
|
|
192
|
+
# Convert to Atom
|
|
193
|
+
feed.to_xml(format: :atom)
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### Extending Tag Support
|
|
197
|
+
|
|
198
|
+
Add support for custom or non-standard tags:
|
|
199
|
+
|
|
200
|
+
```ruby
|
|
201
|
+
# Add a new feed-level tag
|
|
202
|
+
SimpleRSS.feed_tags << :custom_tag
|
|
203
|
+
|
|
204
|
+
# Add item-level tags
|
|
205
|
+
SimpleRSS.item_tags << :custom_item_tag
|
|
206
|
+
|
|
207
|
+
# Parse tags with specific rel attributes (common in Atom)
|
|
208
|
+
SimpleRSS.item_tags << :"link+enclosure"
|
|
209
|
+
# Accessible as: item.link_enclosure
|
|
210
|
+
|
|
211
|
+
# Parse tag attributes
|
|
212
|
+
SimpleRSS.item_tags << :"media:content#url"
|
|
213
|
+
# Accessible as: item.media_content_url
|
|
214
|
+
|
|
215
|
+
# Parse item/entry attributes
|
|
216
|
+
SimpleRSS.item_tags << :"entry#xml:lang"
|
|
217
|
+
# Accessible as: item.entry_xml_lang
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
#### Tag Syntax Reference
|
|
221
|
+
|
|
222
|
+
| Syntax | Example | Accessor | Description |
|
|
223
|
+
|--------|---------|----------|-------------|
|
|
224
|
+
| `tag` | `:title` | `.title` | Simple element content |
|
|
225
|
+
| `tag#attr` | `:"media:content#url"` | `.media_content_url` | Attribute value |
|
|
226
|
+
| `tag+rel` | `:"link+alternate"` | `.link_alternate` | Element with specific `rel` attribute |
|
|
227
|
+
|
|
228
|
+
### Collecting Multiple Values
|
|
229
|
+
|
|
230
|
+
By default, SimpleRSS returns only the first occurrence of each tag. To collect all values:
|
|
231
|
+
|
|
232
|
+
```ruby
|
|
233
|
+
# Collect all categories for each item
|
|
234
|
+
feed = SimpleRSS.parse(xml, array_tags: [:category])
|
|
235
|
+
|
|
236
|
+
item.category # => ["tech", "programming", "ruby"]
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
## API Reference
|
|
240
|
+
|
|
241
|
+
### `SimpleRSS.parse(source, options = {})`
|
|
242
|
+
|
|
243
|
+
Parse RSS/Atom content from a string or IO object.
|
|
244
|
+
|
|
245
|
+
**Parameters:**
|
|
246
|
+
- `source` - String or IO object containing feed XML
|
|
247
|
+
- `options` - Hash of options
|
|
248
|
+
- `:array_tags` - Array of tag symbols to collect as arrays
|
|
249
|
+
|
|
250
|
+
**Returns:** `SimpleRSS` instance
|
|
251
|
+
|
|
252
|
+
### `SimpleRSS.fetch(url, options = {})`
|
|
253
|
+
|
|
254
|
+
Fetch and parse a feed from a URL.
|
|
255
|
+
|
|
256
|
+
**Parameters:**
|
|
257
|
+
- `url` - Feed URL string
|
|
258
|
+
- `options` - Hash of options
|
|
259
|
+
- `:timeout` - Request timeout in seconds
|
|
260
|
+
- `:etag` - ETag from previous request (for conditional GET)
|
|
261
|
+
- `:last_modified` - Last-Modified header from previous request
|
|
262
|
+
- `:follow_redirects` - Follow redirects (default: true)
|
|
263
|
+
- `:headers` - Hash of additional HTTP headers
|
|
264
|
+
|
|
265
|
+
**Returns:** `SimpleRSS` instance, or `nil` if 304 Not Modified
|
|
266
|
+
|
|
267
|
+
### Instance Methods
|
|
268
|
+
|
|
269
|
+
| Method | Description |
|
|
270
|
+
|--------|-------------|
|
|
271
|
+
| `#channel` / `#feed` | Returns self (for RSS/Atom style access) |
|
|
272
|
+
| `#items` / `#entries` | Array of parsed items |
|
|
273
|
+
| `#each` | Iterate over items (includes `Enumerable`) |
|
|
274
|
+
| `#[](index)` | Access item by index |
|
|
275
|
+
| `#latest(n = 10)` | Get n most recent items by date |
|
|
276
|
+
| `#to_json` | JSON string representation |
|
|
277
|
+
| `#to_hash` / `#as_json` | Hash representation |
|
|
278
|
+
| `#to_xml(format:)` | XML string (`:rss2` or `:atom`) |
|
|
279
|
+
| `#etag` | ETag header from fetch (if applicable) |
|
|
280
|
+
| `#last_modified` | Last-Modified header from fetch (if applicable) |
|
|
281
|
+
| `#source` | Original source XML string |
|
|
282
|
+
|
|
283
|
+
## Compatibility
|
|
284
|
+
|
|
285
|
+
- Ruby 3.1+
|
|
286
|
+
- No runtime dependencies
|
|
287
|
+
|
|
288
|
+
## Development
|
|
289
|
+
|
|
290
|
+
```bash
|
|
291
|
+
# Run tests
|
|
292
|
+
bundle exec rake test
|
|
293
|
+
|
|
294
|
+
# Run linter
|
|
295
|
+
bundle exec rubocop
|
|
296
|
+
|
|
297
|
+
# Type checking
|
|
298
|
+
bundle exec steep check
|
|
299
|
+
|
|
300
|
+
# Interactive console
|
|
301
|
+
bundle exec rake console
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
## Contributing
|
|
305
|
+
|
|
306
|
+
1. Fork the repository
|
|
307
|
+
2. Create a feature branch (`git checkout -b feature/my-feature`)
|
|
308
|
+
3. Make your changes with tests
|
|
309
|
+
4. Ensure tests pass (`bundle exec rake test`)
|
|
310
|
+
5. Submit a pull request
|
|
311
|
+
|
|
312
|
+
## Authors
|
|
313
|
+
|
|
314
|
+
- [Lucas Carlson](mailto:lucas@rufy.com)
|
|
315
|
+
- [Herval Freire](mailto:hervalfreire@gmail.com)
|
|
316
|
+
|
|
317
|
+
Inspired by [Blagg](http://www.raelity.org/lang/perl/blagg) by Rael Dornfest.
|
|
318
|
+
|
|
319
|
+
## License
|
|
320
|
+
|
|
321
|
+
This library is released under the terms of the [GNU LGPL](LICENSE).
|
data/lib/simple-rss.rb
CHANGED
|
@@ -4,14 +4,24 @@ require "cgi"
|
|
|
4
4
|
require "time"
|
|
5
5
|
|
|
6
6
|
class SimpleRSS
|
|
7
|
-
|
|
7
|
+
# @rbs skip
|
|
8
|
+
include Enumerable
|
|
9
|
+
|
|
10
|
+
# @rbs!
|
|
11
|
+
# include Enumerable[Hash[Symbol, untyped]]
|
|
12
|
+
|
|
13
|
+
VERSION = "2.1.0".freeze
|
|
8
14
|
|
|
9
15
|
# @rbs @items: Array[Hash[Symbol, untyped]]
|
|
10
16
|
# @rbs @source: String
|
|
11
17
|
# @rbs @options: Hash[Symbol, untyped]
|
|
18
|
+
# @rbs @etag: String?
|
|
19
|
+
# @rbs @last_modified: String?
|
|
12
20
|
|
|
13
21
|
attr_reader :items #: Array[Hash[Symbol, untyped]]
|
|
14
22
|
attr_reader :source #: String
|
|
23
|
+
attr_reader :etag #: String?
|
|
24
|
+
attr_reader :last_modified #: String?
|
|
15
25
|
alias entries items #: Array[Hash[Symbol, untyped]]
|
|
16
26
|
|
|
17
27
|
@@feed_tags = %i[
|
|
@@ -60,6 +70,65 @@ class SimpleRSS
|
|
|
60
70
|
end
|
|
61
71
|
alias feed channel
|
|
62
72
|
|
|
73
|
+
# Iterate over all items in the feed
|
|
74
|
+
#
|
|
75
|
+
# @rbs () { (Hash[Symbol, untyped]) -> void } -> self
|
|
76
|
+
# | () -> Enumerator[Hash[Symbol, untyped], self]
|
|
77
|
+
def each(&block)
|
|
78
|
+
return enum_for(:each) unless block
|
|
79
|
+
|
|
80
|
+
items.each(&block)
|
|
81
|
+
self
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
# Access an item by index
|
|
85
|
+
#
|
|
86
|
+
# @rbs (Integer) -> Hash[Symbol, untyped]?
|
|
87
|
+
def [](index)
|
|
88
|
+
items[index]
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
# Get the n most recent items, sorted by date
|
|
92
|
+
#
|
|
93
|
+
# @rbs (?Integer) -> Array[Hash[Symbol, untyped]]
|
|
94
|
+
def latest(count = 10)
|
|
95
|
+
items.sort_by { |item| item[:pubDate] || item[:updated] || Time.at(0) }.reverse.first(count)
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
# @rbs (?Hash[Symbol, untyped]) -> Hash[Symbol, untyped]
|
|
99
|
+
def as_json(_options = {})
|
|
100
|
+
hash = {} #: Hash[Symbol, untyped]
|
|
101
|
+
|
|
102
|
+
@@feed_tags.each do |tag|
|
|
103
|
+
tag_cleaned = clean_tag(tag)
|
|
104
|
+
value = instance_variable_get("@#{tag_cleaned}")
|
|
105
|
+
hash[tag_cleaned] = serialize_value(value) if value
|
|
106
|
+
end
|
|
107
|
+
|
|
108
|
+
hash[:items] = items.map do |item|
|
|
109
|
+
item.transform_values { |v| serialize_value(v) }
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
hash
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
# @rbs (*untyped) -> String
|
|
116
|
+
def to_json(*)
|
|
117
|
+
require "json"
|
|
118
|
+
JSON.generate(as_json)
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
alias to_hash as_json
|
|
122
|
+
|
|
123
|
+
# @rbs (?format: Symbol) -> String
|
|
124
|
+
def to_xml(format: :rss2)
|
|
125
|
+
case format
|
|
126
|
+
when :rss2 then to_rss2_xml
|
|
127
|
+
when :atom then to_atom_xml
|
|
128
|
+
else raise ArgumentError, "Unknown format: #{format}. Supported: :rss2, :atom"
|
|
129
|
+
end
|
|
130
|
+
end
|
|
131
|
+
|
|
63
132
|
class << self
|
|
64
133
|
# @rbs () -> Array[Symbol]
|
|
65
134
|
def feed_tags
|
|
@@ -87,6 +156,84 @@ class SimpleRSS
|
|
|
87
156
|
def parse(source, options = {})
|
|
88
157
|
new source, options
|
|
89
158
|
end
|
|
159
|
+
|
|
160
|
+
# Fetch and parse a feed from a URL
|
|
161
|
+
# Returns nil if conditional GET returns 304 Not Modified
|
|
162
|
+
#
|
|
163
|
+
# @rbs (String, ?Hash[Symbol, untyped]) -> SimpleRSS?
|
|
164
|
+
def fetch(url, options = {})
|
|
165
|
+
require "net/http"
|
|
166
|
+
require "uri"
|
|
167
|
+
|
|
168
|
+
uri = URI.parse(url)
|
|
169
|
+
response = perform_fetch(uri, options)
|
|
170
|
+
|
|
171
|
+
return nil if response.is_a?(Net::HTTPNotModified)
|
|
172
|
+
|
|
173
|
+
raise SimpleRSSError, "HTTP #{response.code}: #{response.message}" unless response.is_a?(Net::HTTPSuccess)
|
|
174
|
+
|
|
175
|
+
body = response.body.force_encoding(Encoding::UTF_8)
|
|
176
|
+
feed = parse(body, options)
|
|
177
|
+
feed.instance_variable_set(:@etag, response["ETag"])
|
|
178
|
+
feed.instance_variable_set(:@last_modified, response["Last-Modified"])
|
|
179
|
+
feed
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
private
|
|
183
|
+
|
|
184
|
+
# @rbs (untyped, Hash[Symbol, untyped]) -> untyped
|
|
185
|
+
def perform_fetch(uri, options)
|
|
186
|
+
http = build_http(uri, options)
|
|
187
|
+
request = build_request(uri, options)
|
|
188
|
+
|
|
189
|
+
response = http.request(request)
|
|
190
|
+
handle_redirect(response, options) || response
|
|
191
|
+
end
|
|
192
|
+
|
|
193
|
+
# @rbs (untyped, Hash[Symbol, untyped]) -> untyped
|
|
194
|
+
def build_http(uri, options)
|
|
195
|
+
host = uri.host || raise(SimpleRSSError, "Invalid URL: missing host")
|
|
196
|
+
http = Net::HTTP.new(host, uri.port)
|
|
197
|
+
http.use_ssl = uri.scheme == "https"
|
|
198
|
+
|
|
199
|
+
timeout = options[:timeout]
|
|
200
|
+
if timeout
|
|
201
|
+
http.open_timeout = timeout
|
|
202
|
+
http.read_timeout = timeout
|
|
203
|
+
end
|
|
204
|
+
|
|
205
|
+
http
|
|
206
|
+
end
|
|
207
|
+
|
|
208
|
+
# @rbs (untyped, Hash[Symbol, untyped]) -> untyped
|
|
209
|
+
def build_request(uri, options)
|
|
210
|
+
request = Net::HTTP::Get.new(uri)
|
|
211
|
+
request["User-Agent"] = "SimpleRSS/#{VERSION}"
|
|
212
|
+
|
|
213
|
+
# Conditional GET headers
|
|
214
|
+
request["If-None-Match"] = options[:etag] if options[:etag]
|
|
215
|
+
request["If-Modified-Since"] = options[:last_modified] if options[:last_modified]
|
|
216
|
+
|
|
217
|
+
# Custom headers
|
|
218
|
+
options[:headers]&.each { |key, value| request[key] = value }
|
|
219
|
+
|
|
220
|
+
request
|
|
221
|
+
end
|
|
222
|
+
|
|
223
|
+
# @rbs (untyped, Hash[Symbol, untyped]) -> untyped
|
|
224
|
+
def handle_redirect(response, options)
|
|
225
|
+
return nil unless response.is_a?(Net::HTTPRedirection)
|
|
226
|
+
return nil if options[:follow_redirects] == false
|
|
227
|
+
|
|
228
|
+
location = response["Location"]
|
|
229
|
+
return nil unless location
|
|
230
|
+
|
|
231
|
+
redirects = (options[:_redirects] || 0) + 1
|
|
232
|
+
raise SimpleRSSError, "Too many redirects" if redirects > 5
|
|
233
|
+
|
|
234
|
+
new_options = options.merge(_redirects: redirects)
|
|
235
|
+
perform_fetch(URI.parse(location), new_options)
|
|
236
|
+
end
|
|
90
237
|
end
|
|
91
238
|
|
|
92
239
|
DATE_TAGS = %i[pubDate lastBuildDate published updated expirationDate modified dc:date].freeze
|
|
@@ -265,6 +412,127 @@ class SimpleRSS
|
|
|
265
412
|
tag.to_s.tr(":", "_").intern
|
|
266
413
|
end
|
|
267
414
|
|
|
415
|
+
# @rbs (untyped) -> untyped
|
|
416
|
+
def serialize_value(value)
|
|
417
|
+
case value
|
|
418
|
+
when Time then value.iso8601
|
|
419
|
+
else value
|
|
420
|
+
end
|
|
421
|
+
end
|
|
422
|
+
|
|
423
|
+
# @rbs (String?) -> String
|
|
424
|
+
def escape_xml(text)
|
|
425
|
+
return "" if text.nil?
|
|
426
|
+
|
|
427
|
+
text.to_s
|
|
428
|
+
.gsub("&", "&")
|
|
429
|
+
.gsub("<", "<")
|
|
430
|
+
.gsub(">", ">")
|
|
431
|
+
.gsub("'", "'")
|
|
432
|
+
.gsub('"', """)
|
|
433
|
+
end
|
|
434
|
+
|
|
435
|
+
# @rbs (Array[String], String, untyped) -> void
|
|
436
|
+
def add_xml_element(elements, tag, value)
|
|
437
|
+
elements << "<#{tag}>#{escape_xml(value)}</#{tag}>" if value
|
|
438
|
+
end
|
|
439
|
+
|
|
440
|
+
# @rbs (Array[String], String, untyped, Symbol) -> void
|
|
441
|
+
def add_xml_time_element(elements, tag, value, format)
|
|
442
|
+
return unless value.is_a?(Time)
|
|
443
|
+
|
|
444
|
+
formatted = format == :rfc2822 ? value.rfc2822 : value.iso8601
|
|
445
|
+
elements << "<#{tag}>#{formatted}</#{tag}>"
|
|
446
|
+
end
|
|
447
|
+
|
|
448
|
+
# @rbs () -> String
|
|
449
|
+
def to_rss2_xml
|
|
450
|
+
xml = ['<?xml version="1.0" encoding="UTF-8"?>', '<rss version="2.0">', "<channel>"]
|
|
451
|
+
xml.concat(rss2_channel_elements)
|
|
452
|
+
items.each { |item| xml.concat(rss2_item_elements(item)) }
|
|
453
|
+
xml << "</channel>"
|
|
454
|
+
xml << "</rss>"
|
|
455
|
+
xml.join("\n")
|
|
456
|
+
end
|
|
457
|
+
|
|
458
|
+
# @rbs () -> Array[String]
|
|
459
|
+
def rss2_channel_elements
|
|
460
|
+
elements = [] #: Array[String]
|
|
461
|
+
add_xml_element(elements, "title", instance_variable_get(:@title))
|
|
462
|
+
add_xml_element(elements, "link", instance_variable_get(:@link))
|
|
463
|
+
add_xml_element(elements, "description", instance_variable_get(:@description))
|
|
464
|
+
add_xml_element(elements, "language", instance_variable_get(:@language))
|
|
465
|
+
add_xml_time_element(elements, "pubDate", instance_variable_get(:@pubDate), :rfc2822)
|
|
466
|
+
add_xml_time_element(elements, "lastBuildDate", instance_variable_get(:@lastBuildDate), :rfc2822)
|
|
467
|
+
add_xml_element(elements, "generator", instance_variable_get(:@generator))
|
|
468
|
+
elements
|
|
469
|
+
end
|
|
470
|
+
|
|
471
|
+
# @rbs (Hash[Symbol, untyped]) -> Array[String]
|
|
472
|
+
def rss2_item_elements(item)
|
|
473
|
+
elements = ["<item>"] #: Array[String]
|
|
474
|
+
elements << "<title>#{escape_xml(item[:title])}</title>" if item[:title]
|
|
475
|
+
elements << "<link>#{escape_xml(item[:link])}</link>" if item[:link]
|
|
476
|
+
elements << "<description><![CDATA[#{item[:description]}]]></description>" if item[:description]
|
|
477
|
+
elements << "<pubDate>#{item[:pubDate].rfc2822}</pubDate>" if item[:pubDate].is_a?(Time)
|
|
478
|
+
elements << "<guid>#{escape_xml(item[:guid])}</guid>" if item[:guid]
|
|
479
|
+
elements << "<author>#{escape_xml(item[:author])}</author>" if item[:author]
|
|
480
|
+
elements << "<category>#{escape_xml(item[:category])}</category>" if item[:category]
|
|
481
|
+
elements << "</item>"
|
|
482
|
+
elements
|
|
483
|
+
end
|
|
484
|
+
|
|
485
|
+
# @rbs () -> String
|
|
486
|
+
def to_atom_xml
|
|
487
|
+
xml = ['<?xml version="1.0" encoding="UTF-8"?>', '<feed xmlns="http://www.w3.org/2005/Atom">']
|
|
488
|
+
xml.concat(atom_feed_elements)
|
|
489
|
+
items.each { |item| xml.concat(atom_entry_elements(item)) }
|
|
490
|
+
xml << "</feed>"
|
|
491
|
+
xml.join("\n")
|
|
492
|
+
end
|
|
493
|
+
|
|
494
|
+
# @rbs () -> Array[String]
|
|
495
|
+
def atom_feed_elements
|
|
496
|
+
elements = [] #: Array[String]
|
|
497
|
+
title_val = instance_variable_get(:@title)
|
|
498
|
+
link_val = instance_variable_get(:@link)
|
|
499
|
+
id_val = instance_variable_get(:@id)
|
|
500
|
+
add_xml_element(elements, "title", title_val)
|
|
501
|
+
elements << "<link href=\"#{escape_xml(link_val)}\" rel=\"alternate\"/>" if link_val
|
|
502
|
+
elements << "<id>#{escape_xml(id_val || link_val)}</id>" if link_val
|
|
503
|
+
add_xml_time_element(elements, "updated", instance_variable_get(:@updated), :iso8601)
|
|
504
|
+
add_xml_element(elements, "subtitle", instance_variable_get(:@subtitle))
|
|
505
|
+
author_val = instance_variable_get(:@author)
|
|
506
|
+
elements << "<author><name>#{escape_xml(author_val)}</name></author>" if author_val
|
|
507
|
+
add_xml_element(elements, "generator", instance_variable_get(:@generator))
|
|
508
|
+
elements
|
|
509
|
+
end
|
|
510
|
+
|
|
511
|
+
# @rbs (Hash[Symbol, untyped]) -> Array[String]
|
|
512
|
+
def atom_entry_elements(item)
|
|
513
|
+
elements = ["<entry>"] #: Array[String]
|
|
514
|
+
elements << "<title>#{escape_xml(item[:title])}</title>" if item[:title]
|
|
515
|
+
elements << "<link href=\"#{escape_xml(item[:link])}\" rel=\"alternate\"/>" if item[:link]
|
|
516
|
+
elements << "<id>#{escape_xml(item[:id] || item[:guid] || item[:link])}</id>" if item[:id] || item[:guid] || item[:link]
|
|
517
|
+
elements << "<updated>#{item[:updated].iso8601}</updated>" if item[:updated].is_a?(Time)
|
|
518
|
+
atom_entry_published(elements, item)
|
|
519
|
+
elements << "<summary><![CDATA[#{item[:summary] || item[:description]}]]></summary>" if item[:summary] || item[:description]
|
|
520
|
+
elements << "<content><![CDATA[#{item[:content]}]]></content>" if item[:content]
|
|
521
|
+
elements << "<author><name>#{escape_xml(item[:author])}</name></author>" if item[:author]
|
|
522
|
+
elements << "<category term=\"#{escape_xml(item[:category])}\"/>" if item[:category]
|
|
523
|
+
elements << "</entry>"
|
|
524
|
+
elements
|
|
525
|
+
end
|
|
526
|
+
|
|
527
|
+
# @rbs (Array[String], Hash[Symbol, untyped]) -> void
|
|
528
|
+
def atom_entry_published(elements, item)
|
|
529
|
+
if item[:published].is_a?(Time)
|
|
530
|
+
elements << "<published>#{item[:published].iso8601}</published>"
|
|
531
|
+
elsif item[:pubDate].is_a?(Time)
|
|
532
|
+
elements << "<published>#{item[:pubDate].iso8601}</published>"
|
|
533
|
+
end
|
|
534
|
+
end
|
|
535
|
+
|
|
268
536
|
# @rbs (String) -> String
|
|
269
537
|
def unescape(content)
|
|
270
538
|
result = if content =~ %r{([^-_.!~*'()a-zA-Z\d;/?:@&=+$,\[\]]%)}
|
data/simple-rss.gemspec
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
Gem::Specification.new do |s|
|
|
2
2
|
s.name = "simple-rss"
|
|
3
|
-
s.version = "2.
|
|
4
|
-
s.date = "2025-12-
|
|
3
|
+
s.version = "2.1.0"
|
|
4
|
+
s.date = "2025-12-29"
|
|
5
5
|
s.summary = "A simple, flexible, extensible, and liberal RSS and Atom reader for Ruby. It is designed to be backwards compatible with the standard RSS parser, but will never do RSS generation."
|
|
6
6
|
s.email = "lucas@rufy.com"
|
|
7
|
-
s.homepage = "
|
|
7
|
+
s.homepage = "https://github.com/cardmagic/simple-rss"
|
|
8
8
|
s.description = "A simple, flexible, extensible, and liberal RSS and Atom reader for Ruby. It is designed to be backwards compatible with the standard RSS parser, but will never do RSS generation."
|
|
9
9
|
s.authors = ["Lucas Carlson"]
|
|
10
|
-
s.files = Dir["lib/**/*", "test/**/*", "LICENSE", "README.
|
|
11
|
-
s.
|
|
10
|
+
s.files = Dir["lib/**/*", "test/**/*", "LICENSE", "README.md", "Rakefile", "simple-rss.gemspec"]
|
|
11
|
+
s.required_ruby_version = ">= 3.1"
|
|
12
12
|
s.add_development_dependency "rake"
|
|
13
13
|
s.add_development_dependency "rdoc"
|
|
14
14
|
s.add_development_dependency "test-unit"
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
require "test_helper"
|
|
2
|
+
|
|
3
|
+
class EnumerableTest < Test::Unit::TestCase
|
|
4
|
+
def setup
|
|
5
|
+
@rss20 = SimpleRSS.parse open(File.dirname(__FILE__) + "/../data/rss20.xml")
|
|
6
|
+
@atom = SimpleRSS.parse open(File.dirname(__FILE__) + "/../data/atom.xml")
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
def test_includes_enumerable
|
|
10
|
+
assert_includes SimpleRSS.included_modules, Enumerable
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
def test_each_iterates_over_items
|
|
14
|
+
titles = @rss20.map { |item| item[:title] }
|
|
15
|
+
assert_equal @rss20.items.map { |i| i[:title] }, titles
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
def test_each_returns_enumerator_without_block
|
|
19
|
+
enumerator = @rss20.each
|
|
20
|
+
assert_kind_of Enumerator, enumerator
|
|
21
|
+
assert_equal @rss20.items.size, enumerator.count
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def test_each_returns_self_with_block
|
|
25
|
+
count = 0
|
|
26
|
+
result = @rss20.each { |_item| count += 1 }
|
|
27
|
+
assert_equal @rss20, result
|
|
28
|
+
assert_equal @rss20.items.size, count
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def test_enumerable_map
|
|
32
|
+
titles = @rss20.map { |item| item[:title] }
|
|
33
|
+
assert_equal @rss20.items.map { |i| i[:title] }, titles
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def test_enumerable_select
|
|
37
|
+
items_with_link = @rss20.select { |item| item[:link] }
|
|
38
|
+
assert_equal @rss20.items.select { |i| i[:link] }, items_with_link
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
def test_enumerable_first
|
|
42
|
+
assert_equal @rss20.items.first, @rss20.first
|
|
43
|
+
assert_equal @rss20.items.first(3), @rss20.first(3)
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
def test_enumerable_count
|
|
47
|
+
assert_equal @rss20.items.size, @rss20.count
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def test_index_accessor
|
|
51
|
+
assert_equal @rss20.items[0], @rss20[0]
|
|
52
|
+
assert_equal @rss20.items[5], @rss20[5]
|
|
53
|
+
assert_equal @rss20.items[-1], @rss20[-1]
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
def test_index_accessor_out_of_bounds
|
|
57
|
+
assert_nil @rss20[100]
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def test_latest_returns_sorted_items
|
|
61
|
+
latest = @rss20.latest(3)
|
|
62
|
+
assert_equal 3, latest.size
|
|
63
|
+
|
|
64
|
+
dates = latest.map { |item| item[:pubDate] }
|
|
65
|
+
assert_equal dates, dates.sort.reverse
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def test_latest_default_count
|
|
69
|
+
latest = @rss20.latest
|
|
70
|
+
assert latest.size <= 10
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
def test_latest_with_atom_uses_updated
|
|
74
|
+
latest = @atom.latest(1)
|
|
75
|
+
assert_equal 1, latest.size
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
def test_latest_handles_missing_dates
|
|
79
|
+
rss_with_missing_dates = SimpleRSS.parse <<~RSS
|
|
80
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
81
|
+
<rss version="2.0">
|
|
82
|
+
<channel>
|
|
83
|
+
<title>Test Feed</title>
|
|
84
|
+
<link>http://example.com</link>
|
|
85
|
+
<item>
|
|
86
|
+
<title>No Date</title>
|
|
87
|
+
</item>
|
|
88
|
+
<item>
|
|
89
|
+
<title>Has Date</title>
|
|
90
|
+
<pubDate>Wed, 24 Aug 2005 13:33:34 GMT</pubDate>
|
|
91
|
+
</item>
|
|
92
|
+
</channel>
|
|
93
|
+
</rss>
|
|
94
|
+
RSS
|
|
95
|
+
|
|
96
|
+
latest = rss_with_missing_dates.latest(2)
|
|
97
|
+
assert_equal 2, latest.size
|
|
98
|
+
assert_equal "Has Date", latest.first[:title]
|
|
99
|
+
assert_equal "No Date", latest.last[:title]
|
|
100
|
+
end
|
|
101
|
+
end
|
|
@@ -0,0 +1,117 @@
|
|
|
1
|
+
require "test_helper"
|
|
2
|
+
require "net/http"
|
|
3
|
+
|
|
4
|
+
class FetchTest < Test::Unit::TestCase
|
|
5
|
+
def setup
|
|
6
|
+
@sample_feed = File.read(File.dirname(__FILE__) + "/../data/rss20.xml")
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
# Test attr_readers exist and default to nil for parsed feeds
|
|
10
|
+
|
|
11
|
+
def test_etag_attr_reader_exists
|
|
12
|
+
rss = SimpleRSS.parse(@sample_feed)
|
|
13
|
+
assert_respond_to rss, :etag
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def test_last_modified_attr_reader_exists
|
|
17
|
+
rss = SimpleRSS.parse(@sample_feed)
|
|
18
|
+
assert_respond_to rss, :last_modified
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def test_etag_nil_for_parsed_feed
|
|
22
|
+
rss = SimpleRSS.parse(@sample_feed)
|
|
23
|
+
assert_nil rss.etag
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
def test_last_modified_nil_for_parsed_feed
|
|
27
|
+
rss = SimpleRSS.parse(@sample_feed)
|
|
28
|
+
assert_nil rss.last_modified
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
# Test fetch class method exists
|
|
32
|
+
|
|
33
|
+
def test_fetch_class_method_exists
|
|
34
|
+
assert_respond_to SimpleRSS, :fetch
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
# Test fetch with invalid URL raises error
|
|
38
|
+
|
|
39
|
+
def test_fetch_raises_on_invalid_host
|
|
40
|
+
# Socket::ResolutionError was added in Ruby 3.3, use SocketError for older versions
|
|
41
|
+
expected_errors = [SocketError, Errno::ECONNREFUSED, SimpleRSSError]
|
|
42
|
+
expected_errors << Socket::ResolutionError if defined?(Socket::ResolutionError)
|
|
43
|
+
assert_raise(*expected_errors) do
|
|
44
|
+
SimpleRSS.fetch("http://this-host-does-not-exist-12345.invalid/feed.xml", timeout: 1)
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
# Test fetch options are accepted
|
|
49
|
+
|
|
50
|
+
def test_fetch_accepts_etag_option
|
|
51
|
+
# Just verify it doesn't raise an ArgumentError
|
|
52
|
+
assert_nothing_raised do
|
|
53
|
+
SimpleRSS.fetch("http://localhost:1/feed.xml", etag: '"abc123"', timeout: 0.1)
|
|
54
|
+
rescue StandardError
|
|
55
|
+
# Expected - connection will fail
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def test_fetch_accepts_last_modified_option
|
|
60
|
+
assert_nothing_raised do
|
|
61
|
+
SimpleRSS.fetch("http://localhost:1/feed.xml", last_modified: "Wed, 21 Oct 2015 07:28:00 GMT", timeout: 0.1)
|
|
62
|
+
rescue StandardError
|
|
63
|
+
# Expected - connection will fail
|
|
64
|
+
end
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
def test_fetch_accepts_headers_option
|
|
68
|
+
assert_nothing_raised do
|
|
69
|
+
SimpleRSS.fetch("http://localhost:1/feed.xml", headers: { "X-Custom" => "test" }, timeout: 0.1)
|
|
70
|
+
rescue StandardError
|
|
71
|
+
# Expected - connection will fail
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def test_fetch_accepts_timeout_option
|
|
76
|
+
assert_nothing_raised do
|
|
77
|
+
SimpleRSS.fetch("http://localhost:1/feed.xml", timeout: 0.1)
|
|
78
|
+
rescue StandardError
|
|
79
|
+
# Expected - connection will fail
|
|
80
|
+
end
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
def test_fetch_accepts_follow_redirects_option
|
|
84
|
+
assert_nothing_raised do
|
|
85
|
+
SimpleRSS.fetch("http://localhost:1/feed.xml", follow_redirects: false, timeout: 0.1)
|
|
86
|
+
rescue StandardError
|
|
87
|
+
# Expected - connection will fail
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
# Integration tests that require network access
|
|
93
|
+
# These are skipped by default, run with NETWORK_TESTS=1
|
|
94
|
+
class FetchIntegrationTest < Test::Unit::TestCase
|
|
95
|
+
def test_fetch_real_feed
|
|
96
|
+
omit unless ENV["NETWORK_TESTS"]
|
|
97
|
+
# Use a reliable, long-lived RSS feed
|
|
98
|
+
rss = SimpleRSS.fetch("https://feeds.bbci.co.uk/news/rss.xml", timeout: 10)
|
|
99
|
+
assert_kind_of SimpleRSS, rss
|
|
100
|
+
assert rss.title
|
|
101
|
+
assert rss.items.any?
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def test_fetch_stores_caching_headers
|
|
105
|
+
omit unless ENV["NETWORK_TESTS"]
|
|
106
|
+
rss = SimpleRSS.fetch("https://feeds.bbci.co.uk/news/rss.xml", timeout: 10)
|
|
107
|
+
# At least one of these should be present for most feeds
|
|
108
|
+
assert(rss.etag || rss.last_modified, "Expected ETag or Last-Modified header")
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
def test_fetch_follows_redirect
|
|
112
|
+
omit unless ENV["NETWORK_TESTS"]
|
|
113
|
+
# GitHub raw URLs often redirect
|
|
114
|
+
rss = SimpleRSS.fetch("https://github.com/cardmagic/simple-rss/commits/master.atom", timeout: 10)
|
|
115
|
+
assert_kind_of SimpleRSS, rss
|
|
116
|
+
end
|
|
117
|
+
end
|
|
@@ -0,0 +1,142 @@
|
|
|
1
|
+
require "test_helper"
|
|
2
|
+
|
|
3
|
+
class HashXmlSerializationTest < Test::Unit::TestCase
|
|
4
|
+
def setup
|
|
5
|
+
@rss20 = SimpleRSS.parse open(File.dirname(__FILE__) + "/../data/rss20.xml")
|
|
6
|
+
@atom = SimpleRSS.parse open(File.dirname(__FILE__) + "/../data/atom.xml")
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
# to_hash tests
|
|
10
|
+
|
|
11
|
+
def test_to_hash_returns_hash
|
|
12
|
+
result = @rss20.to_hash
|
|
13
|
+
assert_kind_of Hash, result
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def test_to_hash_includes_feed_title
|
|
17
|
+
result = @rss20.to_hash
|
|
18
|
+
assert_equal "Technoblog", result[:title]
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def test_to_hash_includes_items
|
|
22
|
+
result = @rss20.to_hash
|
|
23
|
+
assert_kind_of Array, result[:items]
|
|
24
|
+
assert_equal 10, result[:items].size
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def test_to_hash_is_alias_for_as_json
|
|
28
|
+
assert_equal @rss20.as_json, @rss20.to_hash
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
# to_xml RSS 2.0 tests
|
|
32
|
+
|
|
33
|
+
def test_to_xml_returns_string
|
|
34
|
+
result = @rss20.to_xml
|
|
35
|
+
assert_kind_of String, result
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
def test_to_xml_default_format_is_rss2
|
|
39
|
+
result = @rss20.to_xml
|
|
40
|
+
assert_match(/<rss version="2.0">/, result)
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def test_to_xml_rss2_has_xml_declaration
|
|
44
|
+
result = @rss20.to_xml(format: :rss2)
|
|
45
|
+
assert_match(/^<\?xml version="1.0" encoding="UTF-8"\?>/, result)
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
def test_to_xml_rss2_has_channel
|
|
49
|
+
result = @rss20.to_xml(format: :rss2)
|
|
50
|
+
assert_match(/<channel>/, result)
|
|
51
|
+
assert_match(%r{</channel>}, result)
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def test_to_xml_rss2_has_title
|
|
55
|
+
result = @rss20.to_xml(format: :rss2)
|
|
56
|
+
assert_match(%r{<title>Technoblog</title>}, result)
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def test_to_xml_rss2_has_link
|
|
60
|
+
result = @rss20.to_xml(format: :rss2)
|
|
61
|
+
assert_match(%r{<link>http://tech.rufy.com</link>}, result)
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def test_to_xml_rss2_has_items
|
|
65
|
+
result = @rss20.to_xml(format: :rss2)
|
|
66
|
+
assert_match(/<item>/, result)
|
|
67
|
+
assert_match(%r{</item>}, result)
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
def test_to_xml_rss2_item_has_title
|
|
71
|
+
result = @rss20.to_xml(format: :rss2)
|
|
72
|
+
assert_match(/<item>\n<title>/, result)
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def test_to_xml_rss2_item_has_guid
|
|
76
|
+
result = @rss20.to_xml(format: :rss2)
|
|
77
|
+
assert_match(%r{<guid>http://tech.rufy.com/entry/\d+</guid>}, result)
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
def test_to_xml_rss2_escapes_special_characters
|
|
81
|
+
rss = SimpleRSS.parse('<rss version="2.0"><channel><title>Test & Title</title><item><title>Item <1></title></item></channel></rss>')
|
|
82
|
+
result = rss.to_xml(format: :rss2)
|
|
83
|
+
assert_match(/&/, result)
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
# to_xml Atom tests
|
|
87
|
+
|
|
88
|
+
def test_to_xml_atom_format
|
|
89
|
+
result = @atom.to_xml(format: :atom)
|
|
90
|
+
assert_match(%r{<feed xmlns="http://www.w3.org/2005/Atom">}, result)
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
def test_to_xml_atom_has_title
|
|
94
|
+
result = @atom.to_xml(format: :atom)
|
|
95
|
+
assert_match(%r{<title>dive into mark</title>}, result)
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
def test_to_xml_atom_has_link
|
|
99
|
+
result = @atom.to_xml(format: :atom)
|
|
100
|
+
assert_match(%r{<link href="http://example.org/" rel="alternate"/>}, result)
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
def test_to_xml_atom_has_entries
|
|
104
|
+
result = @atom.to_xml(format: :atom)
|
|
105
|
+
assert_match(/<entry>/, result)
|
|
106
|
+
assert_match(%r{</entry>}, result)
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
def test_to_xml_atom_entry_has_title
|
|
110
|
+
result = @atom.to_xml(format: :atom)
|
|
111
|
+
assert_match(%r{<entry>\n<title>Atom draft-07 snapshot</title>}, result)
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
# Error handling
|
|
115
|
+
|
|
116
|
+
def test_to_xml_raises_on_unknown_format
|
|
117
|
+
assert_raise(ArgumentError) { @rss20.to_xml(format: :unknown) }
|
|
118
|
+
end
|
|
119
|
+
|
|
120
|
+
def test_to_xml_error_message_includes_supported_formats
|
|
121
|
+
error = assert_raise(ArgumentError) { @rss20.to_xml(format: :foo) }
|
|
122
|
+
assert_match(/Supported: :rss2, :atom/, error.message)
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
# Round-trip tests
|
|
126
|
+
|
|
127
|
+
def test_to_xml_rss2_can_be_reparsed
|
|
128
|
+
xml = @rss20.to_xml(format: :rss2)
|
|
129
|
+
reparsed = SimpleRSS.parse(xml)
|
|
130
|
+
|
|
131
|
+
assert_equal "Technoblog", reparsed.title
|
|
132
|
+
assert_equal 10, reparsed.items.size
|
|
133
|
+
end
|
|
134
|
+
|
|
135
|
+
def test_to_xml_atom_can_be_reparsed
|
|
136
|
+
xml = @atom.to_xml(format: :atom)
|
|
137
|
+
reparsed = SimpleRSS.parse(xml)
|
|
138
|
+
|
|
139
|
+
assert_equal "dive into mark", reparsed.title
|
|
140
|
+
assert_equal 1, reparsed.items.size
|
|
141
|
+
end
|
|
142
|
+
end
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
require "test_helper"
|
|
2
|
+
require "json"
|
|
3
|
+
|
|
4
|
+
class JsonSerializationTest < Test::Unit::TestCase
|
|
5
|
+
def setup
|
|
6
|
+
@rss20 = SimpleRSS.parse open(File.dirname(__FILE__) + "/../data/rss20.xml")
|
|
7
|
+
@atom = SimpleRSS.parse open(File.dirname(__FILE__) + "/../data/atom.xml")
|
|
8
|
+
end
|
|
9
|
+
|
|
10
|
+
def test_as_json_returns_hash
|
|
11
|
+
result = @rss20.as_json
|
|
12
|
+
assert_kind_of Hash, result
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
def test_as_json_includes_feed_title
|
|
16
|
+
result = @rss20.as_json
|
|
17
|
+
assert_equal "Technoblog", result[:title]
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def test_as_json_includes_feed_link
|
|
21
|
+
result = @rss20.as_json
|
|
22
|
+
assert_equal "http://tech.rufy.com", result[:link]
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def test_as_json_includes_items
|
|
26
|
+
result = @rss20.as_json
|
|
27
|
+
assert_kind_of Array, result[:items]
|
|
28
|
+
assert_equal 10, result[:items].size
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def test_as_json_items_have_title
|
|
32
|
+
result = @rss20.as_json
|
|
33
|
+
assert result[:items].first[:title].include?("some_string.starts_with?")
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def test_as_json_converts_time_to_iso8601
|
|
37
|
+
result = @rss20.as_json
|
|
38
|
+
pub_date = result[:items].first[:pubDate]
|
|
39
|
+
assert_kind_of String, pub_date
|
|
40
|
+
assert_match(/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/, pub_date)
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def test_as_json_works_with_atom
|
|
44
|
+
result = @atom.as_json
|
|
45
|
+
assert_equal "dive into mark", result[:title]
|
|
46
|
+
assert_equal 1, result[:items].size
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def test_to_json_returns_string
|
|
50
|
+
result = @rss20.to_json
|
|
51
|
+
assert_kind_of String, result
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def test_to_json_is_valid_json
|
|
55
|
+
result = @rss20.to_json
|
|
56
|
+
parsed = JSON.parse(result)
|
|
57
|
+
assert_kind_of Hash, parsed
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def test_to_json_roundtrip
|
|
61
|
+
json_string = @rss20.to_json
|
|
62
|
+
parsed = JSON.parse(json_string, symbolize_names: true)
|
|
63
|
+
|
|
64
|
+
assert_equal "Technoblog", parsed[:title]
|
|
65
|
+
assert_equal 10, parsed[:items].size
|
|
66
|
+
assert parsed[:items].first[:title].include?("some_string.starts_with?")
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
def test_as_json_excludes_nil_feed_tags
|
|
70
|
+
result = @rss20.as_json
|
|
71
|
+
# Feed tags that weren't in the source shouldn't appear
|
|
72
|
+
refute result.key?(:subtitle)
|
|
73
|
+
refute result.key?(:id)
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
def test_as_json_accepts_options_parameter
|
|
77
|
+
# Should not raise, even if options aren't used yet
|
|
78
|
+
result = @rss20.as_json(only: [:title])
|
|
79
|
+
assert_kind_of Hash, result
|
|
80
|
+
end
|
|
81
|
+
end
|
metadata
CHANGED
|
@@ -1,13 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: simple-rss
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 2.
|
|
4
|
+
version: 2.1.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Lucas Carlson
|
|
8
8
|
bindir: bin
|
|
9
9
|
cert_chain: []
|
|
10
|
-
date: 2025-12-
|
|
10
|
+
date: 2025-12-29 00:00:00.000000000 Z
|
|
11
11
|
dependencies:
|
|
12
12
|
- !ruby/object:Gem::Dependency
|
|
13
13
|
name: rake
|
|
@@ -60,7 +60,7 @@ extensions: []
|
|
|
60
60
|
extra_rdoc_files: []
|
|
61
61
|
files:
|
|
62
62
|
- LICENSE
|
|
63
|
-
- README.
|
|
63
|
+
- README.md
|
|
64
64
|
- Rakefile
|
|
65
65
|
- lib/simple-rss.rb
|
|
66
66
|
- simple-rss.gemspec
|
|
@@ -68,8 +68,12 @@ files:
|
|
|
68
68
|
- test/base/base_test.rb
|
|
69
69
|
- test/base/empty_tag_test.rb
|
|
70
70
|
- test/base/encoding_test.rb
|
|
71
|
+
- test/base/enumerable_test.rb
|
|
71
72
|
- test/base/feed_attributes_test.rb
|
|
73
|
+
- test/base/fetch_test.rb
|
|
74
|
+
- test/base/hash_xml_serialization_test.rb
|
|
72
75
|
- test/base/item_attributes_test.rb
|
|
76
|
+
- test/base/json_serialization_test.rb
|
|
73
77
|
- test/data/atom.xml
|
|
74
78
|
- test/data/atom_with_entry_attrs.xml
|
|
75
79
|
- test/data/atom_with_feed_attrs.xml
|
|
@@ -81,7 +85,7 @@ files:
|
|
|
81
85
|
- test/data/rss20_with_channel_attrs.xml
|
|
82
86
|
- test/data/rss20_with_item_attrs.xml
|
|
83
87
|
- test/test_helper.rb
|
|
84
|
-
homepage:
|
|
88
|
+
homepage: https://github.com/cardmagic/simple-rss
|
|
85
89
|
licenses: []
|
|
86
90
|
metadata: {}
|
|
87
91
|
rdoc_options: []
|
|
@@ -91,7 +95,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
91
95
|
requirements:
|
|
92
96
|
- - ">="
|
|
93
97
|
- !ruby/object:Gem::Version
|
|
94
|
-
version: '
|
|
98
|
+
version: '3.1'
|
|
95
99
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
96
100
|
requirements:
|
|
97
101
|
- - ">="
|
data/README.markdown
DELETED
|
@@ -1,47 +0,0 @@
|
|
|
1
|
-
## Welcome to Simple RSS
|
|
2
|
-
|
|
3
|
-
Simple RSS is a simple, flexible, extensible, and liberal RSS and Atom reader
|
|
4
|
-
for Ruby. It is designed to be backwards compatible with the standard RSS
|
|
5
|
-
parser, but will never do RSS generation.
|
|
6
|
-
|
|
7
|
-
## Download
|
|
8
|
-
|
|
9
|
-
* gem install simple-rss
|
|
10
|
-
* https://github.com/cardmagic/simple-rss
|
|
11
|
-
* git clone git@github.com:cardmagic/simple-rss.git
|
|
12
|
-
|
|
13
|
-
### Usage
|
|
14
|
-
The API is similar to Ruby's standard RSS parser:
|
|
15
|
-
|
|
16
|
-
require 'rubygems'
|
|
17
|
-
require 'simple-rss'
|
|
18
|
-
require 'open-uri'
|
|
19
|
-
|
|
20
|
-
rss = SimpleRSS.parse open('http://rss.slashdot.org/Slashdot/slashdot/to')
|
|
21
|
-
|
|
22
|
-
rss.channel.title # => "Slashdot"
|
|
23
|
-
rss.channel.link # => "http://slashdot.org/"
|
|
24
|
-
rss.items.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
|
|
25
|
-
|
|
26
|
-
But since the parser can read Atom feeds as easily as RSS feeds, there are optional aliases that allow more atom like reading:
|
|
27
|
-
|
|
28
|
-
rss.feed.title # => "Slashdot"
|
|
29
|
-
rss.feed.link # => "http://slashdot.org/"
|
|
30
|
-
rss.entries.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
|
|
31
|
-
|
|
32
|
-
The parser does not care about the correctness of the XML as it does not use an XML library to read the information. Thus it is flexible and allows for easy extending via:
|
|
33
|
-
|
|
34
|
-
SimpleRSS.feed_tags << :some_new_tag
|
|
35
|
-
SimpleRSS.item_tags << :"item+myrel" # this will extend SimpleRSS to be able to parse RSS items or ATOM entries that have a rel specified, common in many blogger feeds
|
|
36
|
-
SimpleRSS.item_tags << :"feedburner:origLink" # this will extend SimpleRSS to be able to parse RSS items or ATOM entries that have a specific pre-tag specified, common in many feedburner feeds
|
|
37
|
-
SimpleRSS.item_tags << :"media:content#url" # this will grab the url attribute of the media:content tag
|
|
38
|
-
|
|
39
|
-
## Authors
|
|
40
|
-
|
|
41
|
-
* Lucas Carlson (mailto:lucas@rufy.com)
|
|
42
|
-
* Herval Freire (mailto:hervalfreire@gmail.com)
|
|
43
|
-
|
|
44
|
-
Inspired by [Blagg](http://www.raelity.org/lang/perl/blagg) from Rael Dornfest.
|
|
45
|
-
|
|
46
|
-
This library is released under the terms of the GNU LGPL.
|
|
47
|
-
|