simple-rss 2.0.0 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +321 -0
- data/lib/simple-rss.rb +490 -4
- data/simple-rss.gemspec +4 -4
- data/test/base/enumerable_test.rb +101 -0
- data/test/base/feed_merging_and_diffing_test.rb +140 -0
- data/test/base/fetch_integration_test.rb +25 -0
- data/test/base/fetch_test.rb +90 -0
- data/test/base/filtering_and_validation_test.rb +187 -0
- data/test/base/hash_xml_serialization_test.rb +142 -0
- data/test/base/json_serialization_test.rb +81 -0
- data/test/base/media_and_enclosure_helpers_test.rb +84 -0
- metadata +13 -5
- data/README.markdown +0 -47
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0b914acfc63bfc4e787b6a0373e6c4fbbcec1b953a6517a2709a70b96e5993a6
|
|
4
|
+
data.tar.gz: 26e9dddcca6e05b34e8ef6da52a3654fadb6932ac9558f9ee8e70117e99f1e9f
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 0bbb1967e261cec7c2fdb1bb00fd511562a4647043c40f5b7b6e56aedc5d8d8f003727988ffd2fb77fb7291b397d41008e40eb3a14d156889168423d90f934c5
|
|
7
|
+
data.tar.gz: c1428ee431c4bfd718a573d2d89e5121e9cef46cbd1ce5ade80b21b9bc98b939f35dcdd133287fb70de83c3a00b7a5099d97a2f12699c6224a194c874a42e215
|
data/README.md
ADDED
|
@@ -0,0 +1,321 @@
|
|
|
1
|
+
# SimpleRSS
|
|
2
|
+
|
|
3
|
+
[](https://badge.fury.io/rb/simple-rss)
|
|
4
|
+
[](https://github.com/cardmagic/simple-rss/actions/workflows/ruby.yml)
|
|
5
|
+
[](https://opensource.org/licenses/LGPL-3.0)
|
|
6
|
+
|
|
7
|
+
A simple, flexible, extensible, and liberal RSS and Atom reader for Ruby. Designed to be backwards compatible with Ruby's standard RSS parser while handling malformed feeds gracefully.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- Parses both RSS and Atom feeds
|
|
12
|
+
- Tolerant of malformed XML (regex-based parsing)
|
|
13
|
+
- Built-in URL fetching with conditional GET support (ETags, Last-Modified)
|
|
14
|
+
- JSON and XML serialization
|
|
15
|
+
- Extensible tag definitions
|
|
16
|
+
- Zero runtime dependencies
|
|
17
|
+
|
|
18
|
+
## What's New in 2.0
|
|
19
|
+
|
|
20
|
+
Version 2.0 is a major update with powerful new capabilities:
|
|
21
|
+
|
|
22
|
+
- **URL Fetching** - One-liner feed fetching with `SimpleRSS.fetch(url)`. Supports timeouts, custom headers, and automatic redirect following.
|
|
23
|
+
|
|
24
|
+
- **Conditional GET** - Bandwidth-efficient polling with ETag and Last-Modified support. Returns `nil` when feeds haven't changed (304 Not Modified).
|
|
25
|
+
|
|
26
|
+
- **JSON Serialization** - Export feeds with `to_json`, `to_hash`, and Rails-compatible `as_json`. Time objects serialize to ISO 8601.
|
|
27
|
+
|
|
28
|
+
- **XML Serialization** - Convert any parsed feed to clean RSS 2.0 or Atom XML with `to_xml(format: :rss2)` or `to_xml(format: :atom)`.
|
|
29
|
+
|
|
30
|
+
- **Array Tags** - Collect all occurrences of a tag (like multiple categories) with the `array_tags:` option.
|
|
31
|
+
|
|
32
|
+
- **Attribute Parsing** - Extract attributes from feed, item, and media tags using the `tag#attr` syntax.
|
|
33
|
+
|
|
34
|
+
- **UTF-8 Normalization** - All parsed content is automatically normalized to UTF-8 encoding.
|
|
35
|
+
|
|
36
|
+
- **Modern Ruby** - Full compatibility with Ruby 3.1 through 4.0, with RBS type annotations and Steep type checking.
|
|
37
|
+
|
|
38
|
+
- **Enumerable Support** - Iterate feeds naturally with `each`, `map`, `select`, and all Enumerable methods. Access items by index with `rss[0]` and get the latest items sorted by date with `latest(n)`.
|
|
39
|
+
|
|
40
|
+
## Installation
|
|
41
|
+
|
|
42
|
+
Add to your Gemfile:
|
|
43
|
+
|
|
44
|
+
```ruby
|
|
45
|
+
gem "simple-rss"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Or install directly:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
gem install simple-rss
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Quick Start
|
|
55
|
+
|
|
56
|
+
```ruby
|
|
57
|
+
require "simple-rss"
|
|
58
|
+
require "uri"
|
|
59
|
+
require "net/http"
|
|
60
|
+
|
|
61
|
+
# Parse from a string or IO object
|
|
62
|
+
xml = Net::HTTP.get(URI("https://example.com/feed.xml"))
|
|
63
|
+
rss = SimpleRSS.parse(xml)
|
|
64
|
+
|
|
65
|
+
rss.channel.title # => "Example Feed"
|
|
66
|
+
rss.items.first.title # => "First Post"
|
|
67
|
+
rss.items.first.pubDate # => 2024-01-15 12:00:00 -0500 (Time object)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Usage
|
|
71
|
+
|
|
72
|
+
### Fetching Feeds
|
|
73
|
+
|
|
74
|
+
SimpleRSS includes a built-in fetcher with conditional GET support for efficient polling:
|
|
75
|
+
|
|
76
|
+
```ruby
|
|
77
|
+
# Simple fetch
|
|
78
|
+
feed = SimpleRSS.fetch("https://example.com/feed.xml")
|
|
79
|
+
|
|
80
|
+
# With timeout
|
|
81
|
+
feed = SimpleRSS.fetch("https://example.com/feed.xml", timeout: 10)
|
|
82
|
+
|
|
83
|
+
# Conditional GET - only download if modified
|
|
84
|
+
feed = SimpleRSS.fetch("https://example.com/feed.xml")
|
|
85
|
+
# Store these for next request
|
|
86
|
+
etag = feed.etag
|
|
87
|
+
last_modified = feed.last_modified
|
|
88
|
+
|
|
89
|
+
# On subsequent requests, pass the stored values
|
|
90
|
+
feed = SimpleRSS.fetch(
|
|
91
|
+
"https://example.com/feed.xml",
|
|
92
|
+
etag:,
|
|
93
|
+
last_modified:
|
|
94
|
+
)
|
|
95
|
+
# Returns nil if feed hasn't changed (304 Not Modified)
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### Accessing Feed Data
|
|
99
|
+
|
|
100
|
+
SimpleRSS provides both RSS and Atom style accessors:
|
|
101
|
+
|
|
102
|
+
```ruby
|
|
103
|
+
feed = SimpleRSS.parse(xml)
|
|
104
|
+
|
|
105
|
+
# RSS style
|
|
106
|
+
feed.channel.title
|
|
107
|
+
feed.channel.link
|
|
108
|
+
feed.channel.description
|
|
109
|
+
feed.items
|
|
110
|
+
|
|
111
|
+
# Atom style (aliases)
|
|
112
|
+
feed.feed.title
|
|
113
|
+
feed.entries
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Item Attributes
|
|
117
|
+
|
|
118
|
+
Items support both hash and method access:
|
|
119
|
+
|
|
120
|
+
```ruby
|
|
121
|
+
item = feed.items.first
|
|
122
|
+
|
|
123
|
+
# Hash access
|
|
124
|
+
item[:title]
|
|
125
|
+
item[:link]
|
|
126
|
+
item[:pubDate]
|
|
127
|
+
|
|
128
|
+
# Method access
|
|
129
|
+
item.title
|
|
130
|
+
item.link
|
|
131
|
+
item.pubDate
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Date fields are automatically parsed into `Time` objects:
|
|
135
|
+
|
|
136
|
+
```ruby
|
|
137
|
+
item.pubDate.class # => Time
|
|
138
|
+
item.pubDate.year # => 2024
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Iterating with Enumerable
|
|
142
|
+
|
|
143
|
+
SimpleRSS includes `Enumerable`, so you can iterate feeds naturally:
|
|
144
|
+
|
|
145
|
+
```ruby
|
|
146
|
+
feed = SimpleRSS.parse(xml)
|
|
147
|
+
|
|
148
|
+
# Iterate over items
|
|
149
|
+
feed.each { |item| puts item.title }
|
|
150
|
+
|
|
151
|
+
# Use any Enumerable method
|
|
152
|
+
titles = feed.map { |item| item.title }
|
|
153
|
+
tech_posts = feed.select { |item| item.category == "tech" }
|
|
154
|
+
first_five = feed.first(5)
|
|
155
|
+
total = feed.count
|
|
156
|
+
|
|
157
|
+
# Access items by index
|
|
158
|
+
feed[0].title # first item
|
|
159
|
+
feed[-1].title # last item
|
|
160
|
+
|
|
161
|
+
# Get the n most recent items (sorted by pubDate or updated)
|
|
162
|
+
feed.latest(10)
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
### JSON Serialization
|
|
166
|
+
|
|
167
|
+
```ruby
|
|
168
|
+
feed = SimpleRSS.parse(xml)
|
|
169
|
+
|
|
170
|
+
# Get as hash
|
|
171
|
+
feed.to_hash
|
|
172
|
+
# => { title: "Feed Title", link: "...", items: [...] }
|
|
173
|
+
|
|
174
|
+
# Get as JSON string
|
|
175
|
+
feed.to_json
|
|
176
|
+
# => '{"title":"Feed Title","link":"...","items":[...]}'
|
|
177
|
+
|
|
178
|
+
# Works with Rails/ActiveSupport
|
|
179
|
+
feed.as_json
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### XML Serialization
|
|
183
|
+
|
|
184
|
+
Convert parsed feeds to standard RSS 2.0 or Atom format:
|
|
185
|
+
|
|
186
|
+
```ruby
|
|
187
|
+
feed = SimpleRSS.parse(xml)
|
|
188
|
+
|
|
189
|
+
# Convert to RSS 2.0
|
|
190
|
+
feed.to_xml(format: :rss2)
|
|
191
|
+
|
|
192
|
+
# Convert to Atom
|
|
193
|
+
feed.to_xml(format: :atom)
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### Extending Tag Support
|
|
197
|
+
|
|
198
|
+
Add support for custom or non-standard tags:
|
|
199
|
+
|
|
200
|
+
```ruby
|
|
201
|
+
# Add a new feed-level tag
|
|
202
|
+
SimpleRSS.feed_tags << :custom_tag
|
|
203
|
+
|
|
204
|
+
# Add item-level tags
|
|
205
|
+
SimpleRSS.item_tags << :custom_item_tag
|
|
206
|
+
|
|
207
|
+
# Parse tags with specific rel attributes (common in Atom)
|
|
208
|
+
SimpleRSS.item_tags << :"link+enclosure"
|
|
209
|
+
# Accessible as: item.link_enclosure
|
|
210
|
+
|
|
211
|
+
# Parse tag attributes
|
|
212
|
+
SimpleRSS.item_tags << :"media:content#url"
|
|
213
|
+
# Accessible as: item.media_content_url
|
|
214
|
+
|
|
215
|
+
# Parse item/entry attributes
|
|
216
|
+
SimpleRSS.item_tags << :"entry#xml:lang"
|
|
217
|
+
# Accessible as: item.entry_xml_lang
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
#### Tag Syntax Reference
|
|
221
|
+
|
|
222
|
+
| Syntax | Example | Accessor | Description |
|
|
223
|
+
|--------|---------|----------|-------------|
|
|
224
|
+
| `tag` | `:title` | `.title` | Simple element content |
|
|
225
|
+
| `tag#attr` | `:"media:content#url"` | `.media_content_url` | Attribute value |
|
|
226
|
+
| `tag+rel` | `:"link+alternate"` | `.link_alternate` | Element with specific `rel` attribute |
|
|
227
|
+
|
|
228
|
+
### Collecting Multiple Values
|
|
229
|
+
|
|
230
|
+
By default, SimpleRSS returns only the first occurrence of each tag. To collect all values:
|
|
231
|
+
|
|
232
|
+
```ruby
|
|
233
|
+
# Collect all categories for each item
|
|
234
|
+
feed = SimpleRSS.parse(xml, array_tags: [:category])
|
|
235
|
+
|
|
236
|
+
item.category # => ["tech", "programming", "ruby"]
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
## API Reference
|
|
240
|
+
|
|
241
|
+
### `SimpleRSS.parse(source, options = {})`
|
|
242
|
+
|
|
243
|
+
Parse RSS/Atom content from a string or IO object.
|
|
244
|
+
|
|
245
|
+
**Parameters:**
|
|
246
|
+
- `source` - String or IO object containing feed XML
|
|
247
|
+
- `options` - Hash of options
|
|
248
|
+
- `:array_tags` - Array of tag symbols to collect as arrays
|
|
249
|
+
|
|
250
|
+
**Returns:** `SimpleRSS` instance
|
|
251
|
+
|
|
252
|
+
### `SimpleRSS.fetch(url, options = {})`
|
|
253
|
+
|
|
254
|
+
Fetch and parse a feed from a URL.
|
|
255
|
+
|
|
256
|
+
**Parameters:**
|
|
257
|
+
- `url` - Feed URL string
|
|
258
|
+
- `options` - Hash of options
|
|
259
|
+
- `:timeout` - Request timeout in seconds
|
|
260
|
+
- `:etag` - ETag from previous request (for conditional GET)
|
|
261
|
+
- `:last_modified` - Last-Modified header from previous request
|
|
262
|
+
- `:follow_redirects` - Follow redirects (default: true)
|
|
263
|
+
- `:headers` - Hash of additional HTTP headers
|
|
264
|
+
|
|
265
|
+
**Returns:** `SimpleRSS` instance, or `nil` if 304 Not Modified
|
|
266
|
+
|
|
267
|
+
### Instance Methods
|
|
268
|
+
|
|
269
|
+
| Method | Description |
|
|
270
|
+
|--------|-------------|
|
|
271
|
+
| `#channel` / `#feed` | Returns self (for RSS/Atom style access) |
|
|
272
|
+
| `#items` / `#entries` | Array of parsed items |
|
|
273
|
+
| `#each` | Iterate over items (includes `Enumerable`) |
|
|
274
|
+
| `#[](index)` | Access item by index |
|
|
275
|
+
| `#latest(n = 10)` | Get n most recent items by date |
|
|
276
|
+
| `#to_json` | JSON string representation |
|
|
277
|
+
| `#to_hash` / `#as_json` | Hash representation |
|
|
278
|
+
| `#to_xml(format:)` | XML string (`:rss2` or `:atom`) |
|
|
279
|
+
| `#etag` | ETag header from fetch (if applicable) |
|
|
280
|
+
| `#last_modified` | Last-Modified header from fetch (if applicable) |
|
|
281
|
+
| `#source` | Original source XML string |
|
|
282
|
+
|
|
283
|
+
## Compatibility
|
|
284
|
+
|
|
285
|
+
- Ruby 3.1+
|
|
286
|
+
- No runtime dependencies
|
|
287
|
+
|
|
288
|
+
## Development
|
|
289
|
+
|
|
290
|
+
```bash
|
|
291
|
+
# Run tests
|
|
292
|
+
bundle exec rake test
|
|
293
|
+
|
|
294
|
+
# Run linter
|
|
295
|
+
bundle exec rubocop
|
|
296
|
+
|
|
297
|
+
# Type checking
|
|
298
|
+
bundle exec steep check
|
|
299
|
+
|
|
300
|
+
# Interactive console
|
|
301
|
+
bundle exec rake console
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
## Contributing
|
|
305
|
+
|
|
306
|
+
1. Fork the repository
|
|
307
|
+
2. Create a feature branch (`git checkout -b feature/my-feature`)
|
|
308
|
+
3. Make your changes with tests
|
|
309
|
+
4. Ensure tests pass (`bundle exec rake test`)
|
|
310
|
+
5. Submit a pull request
|
|
311
|
+
|
|
312
|
+
## Authors
|
|
313
|
+
|
|
314
|
+
- [Lucas Carlson](mailto:lucas@rufy.com)
|
|
315
|
+
- [Herval Freire](mailto:hervalfreire@gmail.com)
|
|
316
|
+
|
|
317
|
+
Inspired by [Blagg](http://www.raelity.org/lang/perl/blagg) by Rael Dornfest.
|
|
318
|
+
|
|
319
|
+
## License
|
|
320
|
+
|
|
321
|
+
This library is released under the terms of the [GNU LGPL](LICENSE).
|