html2rss 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +323 -270
  3. data/exe/html2rss +6 -0
  4. data/html2rss.gemspec +18 -23
  5. data/lib/html2rss/attribute_post_processors/gsub.rb +30 -8
  6. data/lib/html2rss/attribute_post_processors/html_to_markdown.rb +7 -2
  7. data/lib/html2rss/attribute_post_processors/html_transformers/transform_urls_to_absolute_ones.rb +27 -0
  8. data/lib/html2rss/attribute_post_processors/html_transformers/wrap_img_in_a.rb +41 -0
  9. data/lib/html2rss/attribute_post_processors/markdown_to_html.rb +11 -2
  10. data/lib/html2rss/attribute_post_processors/parse_time.rb +11 -4
  11. data/lib/html2rss/attribute_post_processors/parse_uri.rb +12 -2
  12. data/lib/html2rss/attribute_post_processors/sanitize_html.rb +40 -44
  13. data/lib/html2rss/attribute_post_processors/substring.rb +14 -4
  14. data/lib/html2rss/attribute_post_processors/template.rb +36 -12
  15. data/lib/html2rss/attribute_post_processors.rb +28 -5
  16. data/lib/html2rss/cli.rb +29 -0
  17. data/lib/html2rss/config/channel.rb +117 -0
  18. data/lib/html2rss/config/selectors.rb +91 -0
  19. data/lib/html2rss/config.rb +71 -82
  20. data/lib/html2rss/item.rb +122 -46
  21. data/lib/html2rss/item_extractors/attribute.rb +20 -7
  22. data/lib/html2rss/item_extractors/href.rb +20 -4
  23. data/lib/html2rss/item_extractors/html.rb +18 -6
  24. data/lib/html2rss/item_extractors/static.rb +18 -7
  25. data/lib/html2rss/item_extractors/text.rb +17 -5
  26. data/lib/html2rss/item_extractors.rb +75 -10
  27. data/lib/html2rss/object_to_xml_converter.rb +56 -0
  28. data/lib/html2rss/rss_builder/channel.rb +21 -0
  29. data/lib/html2rss/rss_builder/item.rb +83 -0
  30. data/lib/html2rss/rss_builder/stylesheet.rb +37 -0
  31. data/lib/html2rss/rss_builder.rb +96 -0
  32. data/lib/html2rss/utils.rb +94 -19
  33. data/lib/html2rss/version.rb +5 -1
  34. data/lib/html2rss.rb +57 -20
  35. metadata +53 -165
  36. data/.gitignore +0 -12
  37. data/.rspec +0 -4
  38. data/.rubocop.yml +0 -164
  39. data/.travis.yml +0 -25
  40. data/.yardopts +0 -6
  41. data/CHANGELOG.md +0 -221
  42. data/Gemfile +0 -8
  43. data/Gemfile.lock +0 -139
  44. data/bin/console +0 -15
  45. data/bin/setup +0 -8
  46. data/lib/html2rss/feed_builder.rb +0 -81
  47. data/lib/html2rss/item_extractors/current_time.rb +0 -21
  48. data/support/logo.png +0 -0
data/README.md CHANGED
@@ -1,38 +1,51 @@
1
- ![html2rss logo](https://github.com/gildesmarais/html2rss/raw/master/support/logo.png)
1
+ ![html2rss logo](https://github.com/html2rss/html2rss/raw/master/support/logo.png)
2
2
 
3
- [![Build Status](https://travis-ci.org/gildesmarais/html2rss.svg?branch=master)](https://travis-ci.org/gildesmarais/html2rss)
4
- [![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/)
5
- [![Coverage Status](https://coveralls.io/repos/github/gildesmarais/html2rss/badge.svg?branch=master)](https://coveralls.io/github/gildesmarais/html2rss?branch=master)
6
- [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss)
7
- ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png)
8
- [![](http://img.shields.io/liberapay/goal/gildesmarais.svg?logo=liberapa)](https://liberapay.com/gildesmarais/donate)
3
+ [![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/) [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss) ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png)
9
4
 
10
- **Searching for a ready to use app which serves generated feeds via HTTP?**
11
- [Head over to `html2rss-web`!](https://github.com/gildesmarais/html2rss-web)
5
+ `html2rss` is a Ruby gem that generates RSS 2.0 feeds from a _feed config_.
12
6
 
13
- This Ruby gem builds RSS 2.0 feeds from a _feed config_.
7
+ With the _feed config_, you provide a URL to scrape and CSS selectors for extracting information (like title, URL, etc.). The gem builds the RSS feed accordingly. [Extractors](#using-extractors) and chainable [post processors](#using-post-processors) make information extraction, processing, and sanitizing a breeze. The gem also supports [scraping JSON](#scraping-and-handling-json-responses) responses and [setting HTTP request headers](#set-any-http-header-in-the-request).
14
8
 
15
- With the _feed config_ containing the URL to scrape and
16
- CSS selectors for information extraction (like title, URL, ...) your RSS builds.
17
- [Extractors](#using-extractors) and chain-able [post processors](#using-post-processors)
18
- make information extraction, processing and sanitizing a breeze.
19
- [Scraping JSON](#scraping-and-handling-json-responses) responses and
20
- [setting HTTP request headers](#set-any-http-header-in-the-request) is
21
- supported, too.
9
+ **Looking for a ready-to-use app to serve generated feeds via HTTP?** [Check out `html2rss-web`](https://github.com/html2rss/html2rss-web)!
10
+
11
+ Support the development by sponsoring this project on GitHub. Thank you! 💓
22
12
 
23
13
  ## Installation
24
14
 
25
- | 🤩 Like it? | Star it! ⭐️ |
26
- | ---------------------------------------------: | -------------------- |
27
- | Add this line to your application's `Gemfile`: | `gem 'html2rss'` |
28
- | Then execute: | `bundle` |
29
- | In your code: | `require 'html2rss'` |
15
+ | Install | `gem install html2rss` |
16
+ | ------- | ---------------------- |
17
+ | Usage | `html2rss help` |
18
+
19
+ You can also install it as a dependency in your Ruby project:
30
20
 
31
- 😍 Love it? Feel free [to donate](https://liberapay.com/gildesmarais/donate). Thank you! 💓
21
+ | 🤩 Like it? | Star it! ⭐️ |
22
+ | -------------------------------: | -------------------- |
23
+ | Add this line to your `Gemfile`: | `gem 'html2rss'` |
24
+ | Then execute: | `bundle` |
25
+ | In your code: | `require 'html2rss'` |
32
26
 
33
- ## Building a feed config
27
+ ## Generating a feed on the CLI
34
28
 
35
- Here's a minimal working example:
29
+ Create a file called `my_config_file.yml` with this example content:
30
+
31
+ ```yml
32
+ channel:
33
+ url: https://stackoverflow.com/questions
34
+ selectors:
35
+ items:
36
+ selector: "#hot-network-questions > ul > li"
37
+ title:
38
+ selector: a
39
+ link:
40
+ selector: a
41
+ extractor: href
42
+ ```
43
+
44
+ Build the RSS with: `html2rss feed ./my_config_file.yml`.
45
+
46
+ ## Generating a feed with Ruby
47
+
48
+ Here's a minimal working example in Ruby:
36
49
 
37
50
  ```ruby
38
51
  require 'html2rss'
@@ -50,54 +63,86 @@ rss =
50
63
  puts rss
51
64
  ```
52
65
 
53
- A _feed config_ consists of a `channel` and a `selectors` Hash.
54
- The contents of both hashes are explained below.
66
+ ## The _feed config_ and its options
67
+
68
+ A _feed config_ consists of a `channel` and a `selectors` hash. The contents of both hashes are explained below.
69
+
70
+ Good to know:
71
+
72
+ - You'll find extensive example feed configs at [`spec/*.test.yml`](https://github.com/html2rss/html2rss/tree/master/spec).
73
+ - See [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) for ready-made feed configs!
74
+ - If you've created feed configs, you're invited to send a PR to [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) to make your config available to the public.
55
75
 
56
- **Looks too complicated?** See [`html2rss-configs`](https://github.com/gildesmarais/html2rss-configs) for ready-made feed configs!
76
+ Alright, let's move on.
57
77
 
58
78
  ### The `channel`
59
79
 
60
- | attribute | | type | default | remark |
61
- | ------------- | -------- | ------- | -------------: | ------------------------------------------ |
62
- | `url` | required | String | | |
63
- | `title` | optional | String | auto-generated | |
64
- | `description` | optional | String | auto-generated | |
65
- | `ttl` | optional | Integer | `360` | TTL in _minutes_ |
66
- | `time_zone` | optional | String | `'UTC'` | TimeZone name |
67
- | `language` | optional | String | `'en'` | Language code |
68
- | `author` | optional | String | | Format: `email (Name)'` |
69
- | `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. |
70
- | `json` | optional | Boolean | `false` | Handle JSON response. See notes below. |
80
+ | attribute | | type | default | remark |
81
+ | ------------- | ------------ | ------- | -------------- | ------------------------------------------ |
82
+ | `url` | **required** | String | | |
83
+ | `title` | optional | String | auto-generated | |
84
+ | `description` | optional | String | auto-generated | |
85
+ | `ttl` | optional | Integer | `360` | TTL in _minutes_ |
86
+ | `time_zone` | optional | String | `'UTC'` | TimeZone name |
87
+ | `language` | optional | String | `'en'` | Language code |
88
+ | `author` | optional | String | | Format: `email (Name)` |
89
+ | `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. |
90
+ | `json` | optional | Boolean | `false` | Handle JSON response. See notes below. |
91
+
92
+ #### Dynamic parameters in `channel` attributes
93
+
94
+ Sometimes there are structurally similar pages with different URLs. In such cases, you can add _dynamic parameters_ to the channel's attributes.
95
+
96
+ Example of a dynamic `id` parameter in the channel URLs:
97
+
98
+ ```yml
99
+ channel:
100
+ url: "http://domainname.tld/whatever/%<id>s.html"
101
+ ```
102
+
103
+ Command line usage example:
104
+
105
+ ```sh
106
+ bundle exec html2rss feed the_feed_config.yml id=42
107
+ ```
108
+
109
+ <details><summary>See a Ruby example</summary>
110
+
111
+ ```ruby
112
+ config = Html2rss::Config.new({ channel: { url: 'http://domainname.tld/whatever/%<id>s.html' } }, {}, { id: 42 })
113
+ Html2rss.feed(config)
114
+ ```
115
+
116
+ </details>
117
+
118
+ See the more complex formatting options of the [`sprintf` method](https://ruby-doc.org/core/Kernel.html#method-i-sprintf).
71
119
 
72
120
  ### The `selectors`
73
121
 
74
- You must provide an `items` selector hash which contains the CSS selector.
75
- `items` needs to return a collection of HTML tags.
76
- The other selectors are scoped to the tags of the items' collection.
77
-
78
- To build a
79
- [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item)
80
- each item has to have at least a `title` or a `description`.
81
-
82
- Your `selectors` can contain arbitrary selector names, but only these
83
- will make it into the RSS feed:
84
-
85
- | RSS 2.0 tag | name in `html2rss` | remark |
86
- | ------------- | ------------------ | --------------------------- |
87
- | `title` | `title` | |
88
- | `description` | `description` | Supports HTML. |
89
- | `link` | `link` | A URL. |
90
- | `author` | `author` | |
91
- | `category` | `categories` | See notes below. |
92
- | `enclosure` | `enclosure` | See notes below. |
93
- | `pubDate` | `update` | An instance of `Time`. |
94
- | `guid` | `guid` | Generated from the `title`. |
95
- | `comments` | `comments` | A URL. |
96
- | `source` | ~~source~~ | Not yet supported. |
122
+ First, you must give an **`items`** selector hash, which contains a CSS selector. The selector selects a collection of HTML tags from which the RSS feed items are built. Except for the `items` selector, all other keys are scoped to each item of the collection.
123
+
124
+ To build a [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item), you need at least a `title` **or** a `description`. You can have both.
125
+
126
+ Having an `items` and a `title` selector is enough to build a simple feed.
127
+
128
+ Your `selectors` hash can contain arbitrary named selectors, but only a few will make it into the RSS feed (due to the RSS 2.0 specification):
129
+
130
+ | RSS 2.0 tag | name in `html2rss` | remark |
131
+ | ------------- | ------------------ | ------------------------------------------- |
132
+ | `title` | `title` | |
133
+ | `description` | `description` | Supports HTML. |
134
+ | `link` | `link` | A URL. |
135
+ | `author` | `author` | |
136
+ | `category` | `categories` | See notes below. |
137
+ | `guid` | `guid` | Default title/description. See notes below. |
138
+ | `enclosure` | `enclosure` | See notes below. |
139
+ | `pubDate` | `updated` | An instance of `Time`. |
140
+ | `comments` | `comments` | A URL. |
141
+ | `source` | ~~source~~ | Not yet supported. |
97
142
 
98
143
  ### The `selector` hash
99
144
 
100
- Your selector hash can have these attributes:
145
+ Every named selector in your `selectors` hash can have these attributes:
101
146
 
102
147
  | name | value |
103
148
  | -------------- | -------------------------------------------------------- |
@@ -105,26 +150,6 @@ Your selector hash can have these attributes:
105
150
  | `extractor` | Name of the extractor. See notes below. |
106
151
  | `post_process` | A hash or array of hashes. See notes below. |
107
152
 
108
- #### Reverse ordering of items
109
-
110
- The `items` selector hash can have an `order` attribute.
111
- If the value is `reverse` the order of items in the RSS will be reversed.
112
-
113
- <details>
114
- <summary>See a YAML feed config example</summary>
115
-
116
- ```yml
117
- channel:
118
-   # ... omitted
119
- selectors:
120
- items:
121
- selector: 'ul > li'
122
- order: 'reverse'
123
-   # ... omitted
124
- ```
125
-
126
- </details>
127
-
128
153
  ## Using extractors
129
154
 
130
155
  Extractors help with extracting the information from the selected HTML tag.
@@ -134,13 +159,11 @@ Extractors help with extracting the information from the selected HTML tag.
134
159
  - The `href` extractor returns a URL from the tag's `href` attribute and corrects relative ones to absolute ones.
135
160
  - The `attribute` extractor returns the value of that tag's attribute.
136
161
  - The `static` extractor returns the configured static value (it doesn't extract anything).
137
- - [See file list of extractors](https://github.com/gildesmarais/html2rss/tree/master/lib/html2rss/item_extractors).
162
+ - [See file list of extractors](https://github.com/html2rss/html2rss/tree/master/lib/html2rss/item_extractors).
138
163
 
139
- Extractors can require additional attributes on the selector hash.
140
- 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/ItemExtractors).
164
+ Extractors might need extra attributes on the selector hash. 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/ItemExtractors).
141
165
 
142
- <details>
143
- <summary>See a Ruby example</summary>
166
+ <details><summary>See a Ruby example</summary>
144
167
 
145
168
  ```ruby
146
169
  Html2rss.feed(
@@ -150,17 +173,16 @@ Html2rss.feed(
150
173
 
151
174
  </details>
152
175
 
153
- <details>
154
- <summary>See a YAML feed config example</summary>
176
+ <details><summary>See a YAML feed config example</summary>
155
177
 
156
178
  ```yml
157
179
  channel:
158
-   # ... omitted
180
+ # ... omitted
159
181
  selectors:
160
-   # ... omitted
182
+ # ... omitted
161
183
  link:
162
- selector: 'a'
163
- extractor: 'href'
184
+ selector: "a"
185
+ extractor: "href"
164
186
  ```
165
187
 
166
188
  </details>
@@ -182,48 +204,11 @@ Extracted information can be further manipulated with post processors.
182
204
 
183
205
  ⚠️ Always make use of the `sanitize_html` post processor for HTML content. _Never trust the internet!_ ⚠️
184
206
 
185
- - [See file list of post processors](https://github.com/gildesmarais/html2rss/tree/master/lib/html2rss/attribute_post_processors).
186
-
187
- 👉 [Read their docs for usage examples.](https://www.rubydoc.info/gems/html2rss/Html2rss/AttributePostProcessors)
188
-
189
- <details>
190
- <summary>See a Ruby example</summary>
191
-
192
- ```ruby
193
- Html2rss.feed(
194
- channel: {},
195
- selectors: {
196
- description: {
197
- selector: '.content', post_process: { name: 'sanitize_html' }
198
- }
199
- }
200
- )
201
- ```
202
-
203
- </details>
204
-
205
- <details>
206
- <summary>See a YAML feed config example</summary>
207
-
208
- ```yml
209
- channel:
210
-   # ... omitted
211
- selectors:
212
-   # ... omitted
213
- description:
214
- selector: '.content'
215
- post_process:
216
- - name: sanitize_html
217
- ```
218
-
219
- </details>
220
-
221
207
  ### Chaining post processors
222
208
 
223
209
  Pass an array to `post_process` to chain the post processors.
224
210
 
225
- <details>
226
- <summary>YAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML</summary>
211
+ <details><summary>YAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML</summary>
227
212
 
228
213
  ```yml
229
214
  channel:
@@ -243,7 +228,44 @@ selectors:
243
228
  - name: markdown_to_html
244
229
  ```
245
230
 
246
- Note the use of `|` for a multi-line String in YAML.
231
+ </details>
232
+
233
+ ### Post processor `gsub`
234
+
235
+ The post processor `gsub` makes use of Ruby's [`gsub`](https://apidock.com/ruby/String/gsub) method.
236
+
237
+ | key | type | required | note |
238
+ | ------------- | ------ | -------- | --------------------------- |
239
+ | `pattern` | String | yes | Can be Regexp or String. |
240
+ | `replacement` | String | yes | Can be a [backreference](). |
241
+
242
+ <details><summary>See a Ruby example</summary>
243
+
244
+ ```ruby
245
+ Html2rss.feed(
246
+ channel: {},
247
+ selectors: {
248
+ title: { selector: 'a', post_process: [{ name: 'gsub', pattern: 'foo', replacement: 'bar' }] }
249
+ }
250
+ )
251
+ ```
252
+
253
+ </details>
254
+
255
+ <details><summary>See a YAML feed config example</summary>
256
+
257
+ ```yml
258
+ channel:
259
+ # ... omitted
260
+ selectors:
261
+ # ... omitted
262
+ title:
263
+ selector: "a"
264
+ post_process:
265
+ - name: "gsub"
266
+ pattern: "foo"
267
+ replacement: "bar"
268
+ ```
247
269
 
248
270
  </details>
249
271
 
@@ -290,65 +312,74 @@ selectors:
290
312
 
291
313
  </details>
292
314
 
293
- ## Adding an `<enclosure>` tag to an item
294
-
295
- An enclosure can be any file, e.g. a image, audio or video.
296
-
297
- The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.
315
+ ## Custom item GUID
298
316
 
299
- Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:
317
+ By default, html2rss generates a GUID from the `title` or `description`.
300
318
 
301
- 1. The content-type is guessed from the file extension of the URL.
302
- 2. If the content-type guessing fails, it will default to `application/octet-stream`.
303
- 3. The content-length will always be undetermined and thus stated as `0` bytes.
319
+ If this does not work well, you can choose other attributes from which the GUID is build.
320
+ The principle is the same as for the categories: pass an array of selectors names.
304
321
 
305
- Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.
322
+ In all cases, the GUID is a SHA1-encoded string.
306
323
 
307
- <details>
308
- <summary>See a Ruby example</summary>
324
+ <details><summary>See a Ruby example</summary>
309
325
 
310
326
  ```ruby
311
327
  Html2rss.feed(
312
328
  channel: {},
313
329
  selectors: {
314
- enclosure: { selector: 'img', extractor: 'attribute', attribute: 'src' }
330
+ title: {
331
+ # ... omitted
332
+ selector: 'h1'
333
+ },
334
+ link: { selector: 'a', extractor: 'href' },
335
+ guid: %i[link]
315
336
  }
316
337
  )
317
338
  ```
318
339
 
319
340
  </details>
320
341
 
321
- <details>
322
- <summary>See a YAML feed config example</summary>
342
+ <details><summary>See a YAML feed config example</summary>
323
343
 
324
344
  ```yml
325
345
  channel:
326
346
    # ... omitted
327
347
  selectors:
328
-   # ... omitted
329
- enclosure:
330
- selector: "img"
331
- extractor: "attribute"
332
- attribute: "src"
348
+ # ... omitted
349
+ title:
350
+ selector: "h1"
351
+ link:
352
+ selector: "a"
353
+ extractor: "href"
354
+ guid:
355
+ - link
333
356
  ```
334
357
 
335
358
  </details>
336
359
 
337
- ## Scraping and handling JSON responses
360
+ ## Adding an `<enclosure>` tag to an item
338
361
 
339
- Although this gem is called **html**​*2rss*, it's possible to scrape and process JSON.
362
+ An enclosure can be any file, e.g. a image, audio or video - think Podcast.
340
363
 
341
- Adding `json: true` to the channel config will convert the JSON response to XML.
364
+ The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.
365
+
366
+ Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:
367
+
368
+ 1. The content-type is guessed from the file extension of the URL.
369
+ 2. If the content-type guessing fails, it will default to `application/octet-stream`.
370
+ 3. The content-length will always be undetermined and therefore stated as `0` bytes.
371
+
372
+ Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.
342
373
 
343
374
  <details>
344
375
  <summary>See a Ruby example</summary>
345
376
 
346
377
  ```ruby
347
378
  Html2rss.feed(
348
- channel: {
349
- url: 'https://example.com', json: true
350
- },
351
- selectors: {} # ... omitted
379
+ channel: {},
380
+ selectors: {
381
+ enclosure: { selector: 'audio', extractor: 'attribute', attribute: 'src' }
382
+ }
352
383
  )
353
384
  ```
354
385
 
@@ -357,130 +388,88 @@ Html2rss.feed(
357
388
  <details>
358
389
  <summary>See a YAML feed config example</summary>
359
390
 
360
- ```yaml
391
+ ```yml
361
392
  channel:
362
- url: https://example.com
363
- json: true
393
+   # ... omitted
364
394
  selectors:
365
395
    # ... omitted
396
+ enclosure:
397
+ selector: "audio"
398
+ extractor: "attribute"
399
+ attribute: "src"
366
400
  ```
367
401
 
368
402
  </details>
403
+ ## Scraping and handling JSON responses
369
404
 
370
- <details>
371
- <summary>See example of a converted JSON object</summary>
405
+ By default, `html2rss` assumes the URL responds with HTML. However, it can also handle JSON responses. The JSON must return an Array or Hash.
372
406
 
373
- This JSON object:
407
+ | key | required | default | note |
408
+ | ---------- | -------- | ------- | ---------------------------------------------------- |
409
+ | `json` | optional | false | If set to `true`, the response is parsed as JSON. |
410
+ | `jsonpath` | optional | $ | Use [JSONPath syntax]() to select nodes of interest. |
374
411
 
375
- ```json
376
- {
377
- "data": [{ "title": "Headline", "url": "https://example.com" }]
378
- }
379
- ```
412
+ <details><summary>See a Ruby example</summary>
380
413
 
381
- converts to:
382
-
383
- ```xml
384
- <hash>
385
- <data>
386
- <datum>
387
- <title>Headline</title>
388
- <url>https://example.com</url>
389
- </datum>
390
- </data>
391
- </hash>
414
+ ```ruby
415
+ Html2rss.feed(
416
+ channel: { url: 'http://domainname.tld/whatever.json', json: true },
417
+ selectors: { title: { selector: 'foo' } }
418
+ )
392
419
  ```
393
420
 
394
- Your items selector would be `data > datum`, the item's `link` selector would be `url`.
395
-
396
- Find further information in [ActiveSupport's `Hash.to_xml` documentation](https://apidock.com/rails/Hash/to_xml).
397
-
398
421
  </details>
399
422
 
400
- <details>
401
- <summary>See example of a converted JSON array</summary>
423
+ <details><summary>See a YAML feed config example</summary>
402
424
 
403
- This JSON array:
404
-
405
- ```json
406
- [{ "title": "Headline", "url": "https://example.com" }]
407
- ```
408
-
409
- converts to:
410
-
411
- ```xml
412
- <objects>
413
- <object>
414
- <title>Headline</title>
415
- <url>https://example.com</url>
416
- </object>
417
- </objects>
425
+ ```yml
426
+ channel:
427
+ url: "http://domainname.tld/whatever.json"
428
+ json: true
429
+ selectors:
430
+ title:
431
+ selector: "foo"
418
432
  ```
419
433
 
420
- Your items selector would be `objects > object`, the item's `link` selector would be `url`.
421
-
422
- Find further information in [ActiveSupport's `Array.to_xml` documentation](https://apidock.com/rails/Array/to_xml).
423
-
424
434
  </details>
425
435
 
426
436
  ## Set any HTTP header in the request
427
437
 
428
- You can add any HTTP headers to the request to the channel URL.
429
- Use this to e.g. have Cookie or Authorization information sent or to spoof the User-Agent.
438
+ To set HTTP request headers, you can add them to the channel's `headers` hash. This is useful for APIs that require an Authorization header.
430
439
 
431
- <details>
432
- <summary>See a Ruby example</summary>
433
-
434
- ```ruby
435
- Html2rss.feed(
436
- channel: {
437
- url: 'https://example.com',
438
- headers: {
439
- "User-Agent": "html2rss-request",
440
- "X-Something": "Foobar",
441
- "Authorization": "Token deadbea7",
442
- "Cookie": "monster=MeWantCookie"
443
- }
444
- },
445
- selectors: {}
446
- )
447
- ```
448
-
449
- </details>
450
-
451
- <details>
452
- <summary>See a YAML feed config example</summary>
453
-
454
- ```yaml
440
+ ```yml
455
441
  channel:
456
- url: https://example.com
442
+ url: "https://example.com/api/resource"
457
443
  headers:
458
- "User-Agent": "html2rss-request"
459
- "X-Something": "Foobar"
460
- "Authorization": "Token deadbea7"
461
- "Cookie": "monster=MeWantCookie"
444
+ Authorization: "Bearer YOUR_TOKEN"
462
445
  selectors:
463
-   # ...
446
+ # ... omitted
464
447
  ```
465
448
 
466
- </details>
449
+ Or for setting a User-Agent:
467
450
 
468
- The headers provided by the channel are merged into the global headers.
451
+ ```yml
452
+ channel:
453
+ url: "https://example.com"
454
+ headers:
455
+ User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
456
+ selectors:
457
+ # ... omitted
458
+ ```
469
459
 
470
460
  ## Usage with a YAML config file
471
461
 
472
462
  This step is not required to work with this gem. If you're using
473
- [`html2rss-web`](https://github.com/gildesmarais/html2rss-web)
463
+ [`html2rss-web`](https://github.com/html2rss/html2rss-web)
474
464
  and want to create your private feed configs, keep on reading!
475
465
 
476
- First, create your YAML file, e.g. called `feeds.yml`.
477
- This file will contain your global config and feed configs.
466
+ First, create a YAML file, e.g. `feeds.yml`. This file will contain your global config and multiple feed configs under the key `feeds`.
478
467
 
479
468
  Example:
480
469
 
481
470
  ```yml
482
471
  headers:
483
- 'User-Agent': "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
472
+ "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
484
473
  feeds:
485
474
  myfeed:
486
475
  channel:
@@ -492,7 +481,12 @@ feeds:
492
481
 
493
482
  Your feed configs go below `feeds`. Everything else is part of the global config.
494
483
 
495
- Build your feeds like this:
484
+ Find a full example of a `feeds.yml` at [`spec/feeds.test.yml`](https://github.com/html2rss/html2rss/blob/master/spec/feeds.test.yml).
485
+
486
+ Now you can build your feeds like this:
487
+
488
+ <details>
489
+ <summary>Build feeds in Ruby</summary>
496
490
 
497
491
  ```ruby
498
492
  require 'html2rss'
@@ -501,37 +495,96 @@ myfeed = Html2rss.feed_from_yaml_config('feeds.yml', 'myfeed')
501
495
  myotherfeed = Html2rss.feed_from_yaml_config('feeds.yml', 'myotherfeed')
502
496
  ```
503
497
 
504
- Find a full example of a `feeds.yml` at [`spec/config.test.yml`](https://github.com/gildesmarais/html2rss/blob/master/spec/config.test.yml).
498
+ </details>
505
499
 
506
- ## Gotchas and tips & tricks
500
+ <details>
501
+ <summary>Build feeds on the command line</summary>
507
502
 
508
- - Check that the channel URL does not redirect to a mobile page with a different markup structure.
509
- - Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript.
510
- - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).
511
- - [CSS selectors are quite versatile, here's an overview.](https://www.w3.org/TR/selectors-4/#overview)
503
+ ```sh
504
+ html2rss feed feeds.yml myfeed
505
+ html2rss feed feeds.yml myotherfeed
506
+ ```
507
+
508
+ </details>
512
509
 
513
- ## Development
510
+ ## Display the RSS feed nicely in a web browser
514
511
 
515
- After checking out the repository, run `bin/setup` to install dependencies. Then, run `bundle exec rspec` to run the tests.
516
- You can also run `bin/console` for an interactive prompt that will allow you to experiment.
512
+ To display RSS feeds nicely in a web browser, you can:
513
+
514
+ - add a plain old CSS stylesheet, or
515
+ - use XSLT (e**X**tensible **S**tylesheet **L**anguage **T**ransformations).
516
+
517
+ A web browser will apply these stylesheets and show the contents as described.
518
+
519
+ In a CSS stylesheet, you'd use `element` selectors to apply styles.
520
+
521
+ If you want to do more, then you need to create a XSLT. XSLT allows you
522
+ to use a HTML template and to freely design the information of the RSS,
523
+ including using JavaScript and external resources.
524
+
525
+ You can add as many stylesheets and types as you like. Just add them to your global configuration.
517
526
 
518
527
  <details>
519
- <summary>Releasing a new version</summary>
520
-
521
- 1. `git pull`
522
- 2. increase version in `lib/html2rss/version.rb`
523
- 3. `bundle`
524
- 4. `git add Gemfile.lock lib/html2rss/version.rb`
525
- 5. `VERSION=$(ruby -e 'require "./lib/html2rss/version.rb"; puts Html2rss::VERSION')`
526
- 6. `git commit -m "chore: release $VERSION"`
527
- 7. `git tag v$VERSION`
528
- 8. [`standard-changelog -f`](https://github.com/conventional-changelog/conventional-changelog/tree/master/packages/standard-changelog)
529
- 9. `git add CHANGELOG.md && git commit --amend`
530
- 10. `git tag v$VERSION -f`
531
- 11. `git push && git push --tags`
528
+ <summary>Ruby: a stylesheet config example</summary>
529
+
530
+ ```ruby
531
+ config = Html2rss::Config.new(
532
+ { channel: {}, selectors: {} }, # omitted
533
+ {
534
+ stylesheets: [
535
+ {
536
+ href: '/relative/base/path/to/style.xls',
537
+ media: :all,
538
+ type: 'text/xsl'
539
+ },
540
+ {
541
+ href: 'http://example.com/rss.css',
542
+ media: :all,
543
+ type: 'text/css'
544
+ }
545
+ ]
546
+ }
547
+ )
548
+
549
+ Html2rss.feed(config)
550
+ ```
532
551
 
533
552
  </details>
534
553
 
535
- ## Contributing
554
+ <details>
555
+ <summary>YAML: a stylesheet config example</summary>
556
+
557
+ ```yml
558
+ stylesheets:
559
+ - href: "/relative/base/path/to/style.xls"
560
+ media: "all"
561
+ type: "text/xsl"
562
+ - href: "http://example.com/rss.css"
563
+ media: "all"
564
+ type: "text/css"
565
+ feeds:
566
+ # ... omitted
567
+ ```
568
+
569
+ </details>
570
+
571
+ Recommended further readings:
572
+
573
+ - [How to format RSS with CSS on lifewire.com](https://www.lifewire.com/how-to-format-rss-3469302)
574
+ - [XSLT: Extensible Stylesheet Language Transformations on MDN](https://developer.mozilla.org/en-US/docs/Web/XSLT)
575
+ - [The XSLT used by html2rss-web](https://github.com/html2rss/html2rss-web/blob/master/public/rss.xsl)
576
+
577
+ ## Gotchas and tips & tricks
578
+
579
+ - Check that the channel URL does not redirect to a mobile page with a different markup structure.
580
+ - Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript.
581
+ - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).
582
+ - [CSS selectors are versatile. Here's an overview.](https://www.w3.org/TR/selectors-4/#overview)
583
+
584
+ ### Contributing
536
585
 
537
- Bug reports and pull requests are welcome on GitHub at https://github.com/gildesmarais/html2rss.
586
+ 1. Fork it ( <https://github.com/html2rss/html2rss/fork> )
587
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
588
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
589
+ 4. Push to the branch (`git push origin my-new-feature`)
590
+ 5. Create a new Pull Request