html2rss 0.9.0 → 0.10.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (47) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +1 -1
  3. data/.mergify.yml +15 -0
  4. data/.rubocop.yml +11 -145
  5. data/Gemfile +19 -2
  6. data/Gemfile.lock +111 -97
  7. data/README.md +323 -270
  8. data/bin/console +1 -0
  9. data/exe/html2rss +6 -0
  10. data/html2rss.gemspec +15 -20
  11. data/lib/html2rss/attribute_post_processors/gsub.rb +30 -8
  12. data/lib/html2rss/attribute_post_processors/html_to_markdown.rb +7 -2
  13. data/lib/html2rss/attribute_post_processors/html_transformers/transform_urls_to_absolute_ones.rb +27 -0
  14. data/lib/html2rss/attribute_post_processors/html_transformers/wrap_img_in_a.rb +41 -0
  15. data/lib/html2rss/attribute_post_processors/markdown_to_html.rb +11 -2
  16. data/lib/html2rss/attribute_post_processors/parse_time.rb +11 -4
  17. data/lib/html2rss/attribute_post_processors/parse_uri.rb +12 -2
  18. data/lib/html2rss/attribute_post_processors/sanitize_html.rb +40 -44
  19. data/lib/html2rss/attribute_post_processors/substring.rb +14 -4
  20. data/lib/html2rss/attribute_post_processors/template.rb +36 -12
  21. data/lib/html2rss/attribute_post_processors.rb +28 -5
  22. data/lib/html2rss/cli.rb +29 -0
  23. data/lib/html2rss/config/channel.rb +117 -0
  24. data/lib/html2rss/config/selectors.rb +91 -0
  25. data/lib/html2rss/config.rb +71 -82
  26. data/lib/html2rss/item.rb +118 -42
  27. data/lib/html2rss/item_extractors/attribute.rb +20 -7
  28. data/lib/html2rss/item_extractors/href.rb +20 -4
  29. data/lib/html2rss/item_extractors/html.rb +18 -6
  30. data/lib/html2rss/item_extractors/static.rb +18 -7
  31. data/lib/html2rss/item_extractors/text.rb +17 -5
  32. data/lib/html2rss/item_extractors.rb +75 -10
  33. data/lib/html2rss/object_to_xml_converter.rb +56 -0
  34. data/lib/html2rss/rss_builder/channel.rb +21 -0
  35. data/lib/html2rss/rss_builder/item.rb +83 -0
  36. data/lib/html2rss/rss_builder/stylesheet.rb +37 -0
  37. data/lib/html2rss/rss_builder.rb +96 -0
  38. data/lib/html2rss/utils.rb +94 -19
  39. data/lib/html2rss/version.rb +5 -1
  40. data/lib/html2rss.rb +51 -20
  41. data/rakefile.rb +16 -0
  42. metadata +51 -154
  43. data/.travis.yml +0 -25
  44. data/CHANGELOG.md +0 -221
  45. data/lib/html2rss/feed_builder.rb +0 -81
  46. data/lib/html2rss/item_extractors/current_time.rb +0 -21
  47. data/support/logo.png +0 -0
data/README.md CHANGED
@@ -1,38 +1,51 @@
1
- ![html2rss logo](https://github.com/gildesmarais/html2rss/raw/master/support/logo.png)
1
+ ![html2rss logo](https://github.com/html2rss/html2rss/raw/master/support/logo.png)
2
2
 
3
- [![Build Status](https://travis-ci.org/gildesmarais/html2rss.svg?branch=master)](https://travis-ci.org/gildesmarais/html2rss)
4
- [![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/)
5
- [![Coverage Status](https://coveralls.io/repos/github/gildesmarais/html2rss/badge.svg?branch=master)](https://coveralls.io/github/gildesmarais/html2rss?branch=master)
6
- [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss)
7
- ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png)
8
- [![](http://img.shields.io/liberapay/goal/gildesmarais.svg?logo=liberapa)](https://liberapay.com/gildesmarais/donate)
3
+ [![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/) [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss) ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png) [![](http://img.shields.io/liberapay/goal/gildesmarais.svg?logo=liberapa)](https://liberapay.com/gildesmarais/donate)
9
4
 
10
- **Searching for a ready to use app which serves generated feeds via HTTP?**
11
- [Head over to `html2rss-web`!](https://github.com/gildesmarais/html2rss-web)
5
+ `html2rss` is a Ruby gem that generates RSS 2.0 feeds from a _feed config_.
12
6
 
13
- This Ruby gem builds RSS 2.0 feeds from a _feed config_.
7
+ With the _feed config_, you provide a URL to scrape and CSS selectors for extracting information (like title, URL, etc.). The gem builds the RSS feed accordingly. [Extractors](#using-extractors) and chainable [post processors](#using-post-processors) make information extraction, processing, and sanitizing a breeze. The gem also supports [scraping JSON](#scraping-and-handling-json-responses) responses and [setting HTTP request headers](#set-any-http-header-in-the-request).
14
8
 
15
- With the _feed config_ containing the URL to scrape and
16
- CSS selectors for information extraction (like title, URL, ...) your RSS builds.
17
- [Extractors](#using-extractors) and chain-able [post processors](#using-post-processors)
18
- make information extraction, processing and sanitizing a breeze.
19
- [Scraping JSON](#scraping-and-handling-json-responses) responses and
20
- [setting HTTP request headers](#set-any-http-header-in-the-request) is
21
- supported, too.
9
+ **Looking for a ready-to-use app to serve generated feeds via HTTP?** [Check out `html2rss-web`](https://github.com/html2rss/html2rss-web)!
10
+
11
+ Support the development by sponsoring this project on GitHub. Thank you! 💓
22
12
 
23
13
  ## Installation
24
14
 
25
- | 🤩 Like it? | Star it! ⭐️ |
26
- | ---------------------------------------------: | -------------------- |
27
- | Add this line to your application's `Gemfile`: | `gem 'html2rss'` |
28
- | Then execute: | `bundle` |
29
- | In your code: | `require 'html2rss'` |
15
+ | Install | `gem install html2rss` |
16
+ | ------- | ---------------------- |
17
+ | Usage | `html2rss help` |
18
+
19
+ You can also install it as a dependency in your Ruby project:
30
20
 
31
- 😍 Love it? Feel free [to donate](https://liberapay.com/gildesmarais/donate). Thank you! 💓
21
+ | 🤩 Like it? | Star it! ⭐️ |
22
+ | -------------------------------: | -------------------- |
23
+ | Add this line to your `Gemfile`: | `gem 'html2rss'` |
24
+ | Then execute: | `bundle` |
25
+ | In your code: | `require 'html2rss'` |
32
26
 
33
- ## Building a feed config
27
+ ## Generating a feed on the CLI
34
28
 
35
- Here's a minimal working example:
29
+ Create a file called `my_config_file.yml` with this example content:
30
+
31
+ ```yml
32
+ channel:
33
+ url: https://stackoverflow.com/questions
34
+ selectors:
35
+ items:
36
+ selector: "#hot-network-questions > ul > li"
37
+ title:
38
+ selector: a
39
+ link:
40
+ selector: a
41
+ extractor: href
42
+ ```
43
+
44
+ Build the RSS with: `html2rss feed ./my_config_file.yml`.
45
+
46
+ ## Generating a feed with Ruby
47
+
48
+ Here's a minimal working example in Ruby:
36
49
 
37
50
  ```ruby
38
51
  require 'html2rss'
@@ -50,54 +63,86 @@ rss =
50
63
  puts rss
51
64
  ```
52
65
 
53
- A _feed config_ consists of a `channel` and a `selectors` Hash.
54
- The contents of both hashes are explained below.
66
+ ## The _feed config_ and its options
67
+
68
+ A _feed config_ consists of a `channel` and a `selectors` hash. The contents of both hashes are explained below.
69
+
70
+ Good to know:
71
+
72
+ - You'll find extensive example feed configs at [`spec/*.test.yml`](https://github.com/html2rss/html2rss/tree/master/spec).
73
+ - See [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) for ready-made feed configs!
74
+ - If you've created feed configs, you're invited to send a PR to [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) to make your config available to the public.
55
75
 
56
- **Looks too complicated?** See [`html2rss-configs`](https://github.com/gildesmarais/html2rss-configs) for ready-made feed configs!
76
+ Alright, let's move on.
57
77
 
58
78
  ### The `channel`
59
79
 
60
- | attribute | | type | default | remark |
61
- | ------------- | -------- | ------- | -------------: | ------------------------------------------ |
62
- | `url` | required | String | | |
63
- | `title` | optional | String | auto-generated | |
64
- | `description` | optional | String | auto-generated | |
65
- | `ttl` | optional | Integer | `360` | TTL in _minutes_ |
66
- | `time_zone` | optional | String | `'UTC'` | TimeZone name |
67
- | `language` | optional | String | `'en'` | Language code |
68
- | `author` | optional | String | | Format: `email (Name)'` |
69
- | `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. |
70
- | `json` | optional | Boolean | `false` | Handle JSON response. See notes below. |
80
+ | attribute | | type | default | remark |
81
+ | ------------- | ------------ | ------- | -------------- | ------------------------------------------ |
82
+ | `url` | **required** | String | | |
83
+ | `title` | optional | String | auto-generated | |
84
+ | `description` | optional | String | auto-generated | |
85
+ | `ttl` | optional | Integer | `360` | TTL in _minutes_ |
86
+ | `time_zone` | optional | String | `'UTC'` | TimeZone name |
87
+ | `language` | optional | String | `'en'` | Language code |
88
+ | `author` | optional | String | | Format: `email (Name)` |
89
+ | `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. |
90
+ | `json` | optional | Boolean | `false` | Handle JSON response. See notes below. |
91
+
92
+ #### Dynamic parameters in `channel` attributes
93
+
94
+ Sometimes there are structurally similar pages with different URLs. In such cases, you can add _dynamic parameters_ to the channel's attributes.
95
+
96
+ Example of a dynamic `id` parameter in the channel URLs:
97
+
98
+ ```yml
99
+ channel:
100
+ url: "http://domainname.tld/whatever/%<id>s.html"
101
+ ```
102
+
103
+ Command line usage example:
104
+
105
+ ```sh
106
+ bundle exec html2rss feed the_feed_config.yml id=42
107
+ ```
108
+
109
+ <details><summary>See a Ruby example</summary>
110
+
111
+ ```ruby
112
+ config = Html2rss::Config.new({ channel: { url: 'http://domainname.tld/whatever/%<id>s.html' } }, {}, { id: 42 })
113
+ Html2rss.feed(config)
114
+ ```
115
+
116
+ </details>
117
+
118
+ See the more complex formatting options of the [`sprintf` method](https://ruby-doc.org/core/Kernel.html#method-i-sprintf).
71
119
 
72
120
  ### The `selectors`
73
121
 
74
- You must provide an `items` selector hash which contains the CSS selector.
75
- `items` needs to return a collection of HTML tags.
76
- The other selectors are scoped to the tags of the items' collection.
77
-
78
- To build a
79
- [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item)
80
- each item has to have at least a `title` or a `description`.
81
-
82
- Your `selectors` can contain arbitrary selector names, but only these
83
- will make it into the RSS feed:
84
-
85
- | RSS 2.0 tag | name in `html2rss` | remark |
86
- | ------------- | ------------------ | --------------------------- |
87
- | `title` | `title` | |
88
- | `description` | `description` | Supports HTML. |
89
- | `link` | `link` | A URL. |
90
- | `author` | `author` | |
91
- | `category` | `categories` | See notes below. |
92
- | `enclosure` | `enclosure` | See notes below. |
93
- | `pubDate` | `update` | An instance of `Time`. |
94
- | `guid` | `guid` | Generated from the `title`. |
95
- | `comments` | `comments` | A URL. |
96
- | `source` | ~~source~~ | Not yet supported. |
122
+ First, you must give an **`items`** selector hash, which contains a CSS selector. The selector selects a collection of HTML tags from which the RSS feed items are built. Except for the `items` selector, all other keys are scoped to each item of the collection.
123
+
124
+ To build a [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item), you need at least a `title` **or** a `description`. You can have both.
125
+
126
+ Having an `items` and a `title` selector is enough to build a simple feed.
127
+
128
+ Your `selectors` hash can contain arbitrary named selectors, but only a few will make it into the RSS feed (due to the RSS 2.0 specification):
129
+
130
+ | RSS 2.0 tag | name in `html2rss` | remark |
131
+ | ------------- | ------------------ | ------------------------------------------- |
132
+ | `title` | `title` | |
133
+ | `description` | `description` | Supports HTML. |
134
+ | `link` | `link` | A URL. |
135
+ | `author` | `author` | |
136
+ | `category` | `categories` | See notes below. |
137
+ | `guid` | `guid` | Default title/description. See notes below. |
138
+ | `enclosure` | `enclosure` | See notes below. |
139
+ | `pubDate` | `updated` | An instance of `Time`. |
140
+ | `comments` | `comments` | A URL. |
141
+ | `source` | ~~source~~ | Not yet supported. |
97
142
 
98
143
  ### The `selector` hash
99
144
 
100
- Your selector hash can have these attributes:
145
+ Every named selector in your `selectors` hash can have these attributes:
101
146
 
102
147
  | name | value |
103
148
  | -------------- | -------------------------------------------------------- |
@@ -105,26 +150,6 @@ Your selector hash can have these attributes:
105
150
  | `extractor` | Name of the extractor. See notes below. |
106
151
  | `post_process` | A hash or array of hashes. See notes below. |
107
152
 
108
- #### Reverse ordering of items
109
-
110
- The `items` selector hash can have an `order` attribute.
111
- If the value is `reverse` the order of items in the RSS will be reversed.
112
-
113
- <details>
114
- <summary>See a YAML feed config example</summary>
115
-
116
- ```yml
117
- channel:
118
-   # ... omitted
119
- selectors:
120
- items:
121
- selector: 'ul > li'
122
- order: 'reverse'
123
-   # ... omitted
124
- ```
125
-
126
- </details>
127
-
128
153
  ## Using extractors
129
154
 
130
155
  Extractors help with extracting the information from the selected HTML tag.
@@ -134,13 +159,11 @@ Extractors help with extracting the information from the selected HTML tag.
134
159
  - The `href` extractor returns a URL from the tag's `href` attribute and corrects relative ones to absolute ones.
135
160
  - The `attribute` extractor returns the value of that tag's attribute.
136
161
  - The `static` extractor returns the configured static value (it doesn't extract anything).
137
- - [See file list of extractors](https://github.com/gildesmarais/html2rss/tree/master/lib/html2rss/item_extractors).
162
+ - [See file list of extractors](https://github.com/html2rss/html2rss/tree/master/lib/html2rss/item_extractors).
138
163
 
139
- Extractors can require additional attributes on the selector hash.
140
- 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/ItemExtractors).
164
+ Extractors might need extra attributes on the selector hash. 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/ItemExtractors).
141
165
 
142
- <details>
143
- <summary>See a Ruby example</summary>
166
+ <details><summary>See a Ruby example</summary>
144
167
 
145
168
  ```ruby
146
169
  Html2rss.feed(
@@ -150,17 +173,16 @@ Html2rss.feed(
150
173
 
151
174
  </details>
152
175
 
153
- <details>
154
- <summary>See a YAML feed config example</summary>
176
+ <details><summary>See a YAML feed config example</summary>
155
177
 
156
178
  ```yml
157
179
  channel:
158
-   # ... omitted
180
+ # ... omitted
159
181
  selectors:
160
-   # ... omitted
182
+ # ... omitted
161
183
  link:
162
- selector: 'a'
163
- extractor: 'href'
184
+ selector: "a"
185
+ extractor: "href"
164
186
  ```
165
187
 
166
188
  </details>
@@ -182,48 +204,11 @@ Extracted information can be further manipulated with post processors.
182
204
 
183
205
  ⚠️ Always make use of the `sanitize_html` post processor for HTML content. _Never trust the internet!_ ⚠️
184
206
 
185
- - [See file list of post processors](https://github.com/gildesmarais/html2rss/tree/master/lib/html2rss/attribute_post_processors).
186
-
187
- 👉 [Read their docs for usage examples.](https://www.rubydoc.info/gems/html2rss/Html2rss/AttributePostProcessors)
188
-
189
- <details>
190
- <summary>See a Ruby example</summary>
191
-
192
- ```ruby
193
- Html2rss.feed(
194
- channel: {},
195
- selectors: {
196
- description: {
197
- selector: '.content', post_process: { name: 'sanitize_html' }
198
- }
199
- }
200
- )
201
- ```
202
-
203
- </details>
204
-
205
- <details>
206
- <summary>See a YAML feed config example</summary>
207
-
208
- ```yml
209
- channel:
210
-   # ... omitted
211
- selectors:
212
-   # ... omitted
213
- description:
214
- selector: '.content'
215
- post_process:
216
- - name: sanitize_html
217
- ```
218
-
219
- </details>
220
-
221
207
  ### Chaining post processors
222
208
 
223
209
  Pass an array to `post_process` to chain the post processors.
224
210
 
225
- <details>
226
- <summary>YAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML</summary>
211
+ <details><summary>YAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML</summary>
227
212
 
228
213
  ```yml
229
214
  channel:
@@ -243,7 +228,44 @@ selectors:
243
228
  - name: markdown_to_html
244
229
  ```
245
230
 
246
- Note the use of `|` for a multi-line String in YAML.
231
+ </details>
232
+
233
+ ### Post processor `gsub`
234
+
235
+ The post processor `gsub` makes use of Ruby's [`gsub`](https://apidock.com/ruby/String/gsub) method.
236
+
237
+ | key | type | required | note |
238
+ | ------------- | ------ | -------- | --------------------------- |
239
+ | `pattern` | String | yes | Can be Regexp or String. |
240
+ | `replacement` | String | yes | Can be a [backreference](). |
241
+
242
+ <details><summary>See a Ruby example</summary>
243
+
244
+ ```ruby
245
+ Html2rss.feed(
246
+ channel: {},
247
+ selectors: {
248
+ title: { selector: 'a', post_process: [{ name: 'gsub', pattern: 'foo', replacement: 'bar' }] }
249
+ }
250
+ )
251
+ ```
252
+
253
+ </details>
254
+
255
+ <details><summary>See a YAML feed config example</summary>
256
+
257
+ ```yml
258
+ channel:
259
+ # ... omitted
260
+ selectors:
261
+ # ... omitted
262
+ title:
263
+ selector: "a"
264
+ post_process:
265
+ - name: "gsub"
266
+ pattern: "foo"
267
+ replacement: "bar"
268
+ ```
247
269
 
248
270
  </details>
249
271
 
@@ -290,65 +312,74 @@ selectors:
290
312
 
291
313
  </details>
292
314
 
293
- ## Adding an `<enclosure>` tag to an item
294
-
295
- An enclosure can be any file, e.g. a image, audio or video.
296
-
297
- The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.
315
+ ## Custom item GUID
298
316
 
299
- Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:
317
+ By default, html2rss generates a GUID from the `title` or `description`.
300
318
 
301
- 1. The content-type is guessed from the file extension of the URL.
302
- 2. If the content-type guessing fails, it will default to `application/octet-stream`.
303
- 3. The content-length will always be undetermined and thus stated as `0` bytes.
319
+ If this does not work well, you can choose other attributes from which the GUID is build.
320
+ The principle is the same as for the categories: pass an array of selectors names.
304
321
 
305
- Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.
322
+ In all cases, the GUID is a SHA1-encoded string.
306
323
 
307
- <details>
308
- <summary>See a Ruby example</summary>
324
+ <details><summary>See a Ruby example</summary>
309
325
 
310
326
  ```ruby
311
327
  Html2rss.feed(
312
328
  channel: {},
313
329
  selectors: {
314
- enclosure: { selector: 'img', extractor: 'attribute', attribute: 'src' }
330
+ title: {
331
+ # ... omitted
332
+ selector: 'h1'
333
+ },
334
+ link: { selector: 'a', extractor: 'href' },
335
+ guid: %i[link]
315
336
  }
316
337
  )
317
338
  ```
318
339
 
319
340
  </details>
320
341
 
321
- <details>
322
- <summary>See a YAML feed config example</summary>
342
+ <details><summary>See a YAML feed config example</summary>
323
343
 
324
344
  ```yml
325
345
  channel:
326
346
    # ... omitted
327
347
  selectors:
328
-   # ... omitted
329
- enclosure:
330
- selector: "img"
331
- extractor: "attribute"
332
- attribute: "src"
348
+ # ... omitted
349
+ title:
350
+ selector: "h1"
351
+ link:
352
+ selector: "a"
353
+ extractor: "href"
354
+ guid:
355
+ - link
333
356
  ```
334
357
 
335
358
  </details>
336
359
 
337
- ## Scraping and handling JSON responses
360
+ ## Adding an `<enclosure>` tag to an item
338
361
 
339
- Although this gem is called **html**​*2rss*, it's possible to scrape and process JSON.
362
+ An enclosure can be any file, e.g. a image, audio or video - think Podcast.
340
363
 
341
- Adding `json: true` to the channel config will convert the JSON response to XML.
364
+ The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.
365
+
366
+ Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:
367
+
368
+ 1. The content-type is guessed from the file extension of the URL.
369
+ 2. If the content-type guessing fails, it will default to `application/octet-stream`.
370
+ 3. The content-length will always be undetermined and therefore stated as `0` bytes.
371
+
372
+ Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.
342
373
 
343
374
  <details>
344
375
  <summary>See a Ruby example</summary>
345
376
 
346
377
  ```ruby
347
378
  Html2rss.feed(
348
- channel: {
349
- url: 'https://example.com', json: true
350
- },
351
- selectors: {} # ... omitted
379
+ channel: {},
380
+ selectors: {
381
+ enclosure: { selector: 'audio', extractor: 'attribute', attribute: 'src' }
382
+ }
352
383
  )
353
384
  ```
354
385
 
@@ -357,130 +388,88 @@ Html2rss.feed(
357
388
  <details>
358
389
  <summary>See a YAML feed config example</summary>
359
390
 
360
- ```yaml
391
+ ```yml
361
392
  channel:
362
- url: https://example.com
363
- json: true
393
+   # ... omitted
364
394
  selectors:
365
395
    # ... omitted
396
+ enclosure:
397
+ selector: "audio"
398
+ extractor: "attribute"
399
+ attribute: "src"
366
400
  ```
367
401
 
368
402
  </details>
403
+ ## Scraping and handling JSON responses
369
404
 
370
- <details>
371
- <summary>See example of a converted JSON object</summary>
405
+ By default, `html2rss` assumes the URL responds with HTML. However, it can also handle JSON responses. The JSON must return an Array or Hash.
372
406
 
373
- This JSON object:
407
+ | key | required | default | note |
408
+ | ---------- | -------- | ------- | ---------------------------------------------------- |
409
+ | `json` | optional | false | If set to `true`, the response is parsed as JSON. |
410
+ | `jsonpath` | optional | $ | Use [JSONPath syntax]() to select nodes of interest. |
374
411
 
375
- ```json
376
- {
377
- "data": [{ "title": "Headline", "url": "https://example.com" }]
378
- }
379
- ```
412
+ <details><summary>See a Ruby example</summary>
380
413
 
381
- converts to:
382
-
383
- ```xml
384
- <hash>
385
- <data>
386
- <datum>
387
- <title>Headline</title>
388
- <url>https://example.com</url>
389
- </datum>
390
- </data>
391
- </hash>
414
+ ```ruby
415
+ Html2rss.feed(
416
+ channel: { url: 'http://domainname.tld/whatever.json', json: true },
417
+ selectors: { title: { selector: 'foo' } }
418
+ )
392
419
  ```
393
420
 
394
- Your items selector would be `data > datum`, the item's `link` selector would be `url`.
395
-
396
- Find further information in [ActiveSupport's `Hash.to_xml` documentation](https://apidock.com/rails/Hash/to_xml).
397
-
398
421
  </details>
399
422
 
400
- <details>
401
- <summary>See example of a converted JSON array</summary>
423
+ <details><summary>See a YAML feed config example</summary>
402
424
 
403
- This JSON array:
404
-
405
- ```json
406
- [{ "title": "Headline", "url": "https://example.com" }]
407
- ```
408
-
409
- converts to:
410
-
411
- ```xml
412
- <objects>
413
- <object>
414
- <title>Headline</title>
415
- <url>https://example.com</url>
416
- </object>
417
- </objects>
425
+ ```yml
426
+ channel:
427
+ url: "http://domainname.tld/whatever.json"
428
+ json: true
429
+ selectors:
430
+ title:
431
+ selector: "foo"
418
432
  ```
419
433
 
420
- Your items selector would be `objects > object`, the item's `link` selector would be `url`.
421
-
422
- Find further information in [ActiveSupport's `Array.to_xml` documentation](https://apidock.com/rails/Array/to_xml).
423
-
424
434
  </details>
425
435
 
426
436
  ## Set any HTTP header in the request
427
437
 
428
- You can add any HTTP headers to the request to the channel URL.
429
- Use this to e.g. have Cookie or Authorization information sent or to spoof the User-Agent.
438
+ To set HTTP request headers, you can add them to the channel's `headers` hash. This is useful for APIs that require an Authorization header.
430
439
 
431
- <details>
432
- <summary>See a Ruby example</summary>
433
-
434
- ```ruby
435
- Html2rss.feed(
436
- channel: {
437
- url: 'https://example.com',
438
- headers: {
439
- "User-Agent": "html2rss-request",
440
- "X-Something": "Foobar",
441
- "Authorization": "Token deadbea7",
442
- "Cookie": "monster=MeWantCookie"
443
- }
444
- },
445
- selectors: {}
446
- )
447
- ```
448
-
449
- </details>
450
-
451
- <details>
452
- <summary>See a YAML feed config example</summary>
453
-
454
- ```yaml
440
+ ```yml
455
441
  channel:
456
- url: https://example.com
442
+ url: "https://example.com/api/resource"
457
443
  headers:
458
- "User-Agent": "html2rss-request"
459
- "X-Something": "Foobar"
460
- "Authorization": "Token deadbea7"
461
- "Cookie": "monster=MeWantCookie"
444
+ Authorization: "Bearer YOUR_TOKEN"
462
445
  selectors:
463
-   # ...
446
+ # ... omitted
464
447
  ```
465
448
 
466
- </details>
449
+ Or for setting a User-Agent:
467
450
 
468
- The headers provided by the channel are merged into the global headers.
451
+ ```yml
452
+ channel:
453
+ url: "https://example.com"
454
+ headers:
455
+ User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
456
+ selectors:
457
+ # ... omitted
458
+ ```
469
459
 
470
460
  ## Usage with a YAML config file
471
461
 
472
462
  This step is not required to work with this gem. If you're using
473
- [`html2rss-web`](https://github.com/gildesmarais/html2rss-web)
463
+ [`html2rss-web`](https://github.com/html2rss/html2rss-web)
474
464
  and want to create your private feed configs, keep on reading!
475
465
 
476
- First, create your YAML file, e.g. called `feeds.yml`.
477
- This file will contain your global config and feed configs.
466
+ First, create a YAML file, e.g. `feeds.yml`. This file will contain your global config and multiple feed configs under the key `feeds`.
478
467
 
479
468
  Example:
480
469
 
481
470
  ```yml
482
471
  headers:
483
- 'User-Agent': "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
472
+ "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
484
473
  feeds:
485
474
  myfeed:
486
475
  channel:
@@ -492,7 +481,12 @@ feeds:
492
481
 
493
482
  Your feed configs go below `feeds`. Everything else is part of the global config.
494
483
 
495
- Build your feeds like this:
484
+ Find a full example of a `feeds.yml` at [`spec/feeds.test.yml`](https://github.com/html2rss/html2rss/blob/master/spec/feeds.test.yml).
485
+
486
+ Now you can build your feeds like this:
487
+
488
+ <details>
489
+ <summary>Build feeds in Ruby</summary>
496
490
 
497
491
  ```ruby
498
492
  require 'html2rss'
@@ -501,37 +495,96 @@ myfeed = Html2rss.feed_from_yaml_config('feeds.yml', 'myfeed')
501
495
  myotherfeed = Html2rss.feed_from_yaml_config('feeds.yml', 'myotherfeed')
502
496
  ```
503
497
 
504
- Find a full example of a `feeds.yml` at [`spec/config.test.yml`](https://github.com/gildesmarais/html2rss/blob/master/spec/config.test.yml).
498
+ </details>
505
499
 
506
- ## Gotchas and tips & tricks
500
+ <details>
501
+ <summary>Build feeds on the command line</summary>
507
502
 
508
- - Check that the channel URL does not redirect to a mobile page with a different markup structure.
509
- - Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript.
510
- - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).
511
- - [CSS selectors are quite versatile, here's an overview.](https://www.w3.org/TR/selectors-4/#overview)
503
+ ```sh
504
+ html2rss feed feeds.yml myfeed
505
+ html2rss feed feeds.yml myotherfeed
506
+ ```
507
+
508
+ </details>
512
509
 
513
- ## Development
510
+ ## Display the RSS feed nicely in a web browser
514
511
 
515
- After checking out the repository, run `bin/setup` to install dependencies. Then, run `bundle exec rspec` to run the tests.
516
- You can also run `bin/console` for an interactive prompt that will allow you to experiment.
512
+ To display RSS feeds nicely in a web browser, you can:
513
+
514
+ - add a plain old CSS stylesheet, or
515
+ - use XSLT (e**X**tensible **S**tylesheet **L**anguage **T**ransformations).
516
+
517
+ A web browser will apply these stylesheets and show the contents as described.
518
+
519
+ In a CSS stylesheet, you'd use `element` selectors to apply styles.
520
+
521
+ If you want to do more, then you need to create a XSLT. XSLT allows you
522
+ to use a HTML template and to freely design the information of the RSS,
523
+ including using JavaScript and external resources.
524
+
525
+ You can add as many stylesheets and types as you like. Just add them to your global configuration.
517
526
 
518
527
  <details>
519
- <summary>Releasing a new version</summary>
520
-
521
- 1. `git pull`
522
- 2. increase version in `lib/html2rss/version.rb`
523
- 3. `bundle`
524
- 4. `git add Gemfile.lock lib/html2rss/version.rb`
525
- 5. `VERSION=$(ruby -e 'require "./lib/html2rss/version.rb"; puts Html2rss::VERSION')`
526
- 6. `git commit -m "chore: release $VERSION"`
527
- 7. `git tag v$VERSION`
528
- 8. [`standard-changelog -f`](https://github.com/conventional-changelog/conventional-changelog/tree/master/packages/standard-changelog)
529
- 9. `git add CHANGELOG.md && git commit --amend`
530
- 10. `git tag v$VERSION -f`
531
- 11. `git push && git push --tags`
528
+ <summary>Ruby: a stylesheet config example</summary>
529
+
530
+ ```ruby
531
+ config = Html2rss::Config.new(
532
+ { channel: {}, selectors: {} }, # omitted
533
+ {
534
+ stylesheets: [
535
+ {
536
+ href: '/relative/base/path/to/style.xls',
537
+ media: :all,
538
+ type: 'text/xsl'
539
+ },
540
+ {
541
+ href: 'http://example.com/rss.css',
542
+ media: :all,
543
+ type: 'text/css'
544
+ }
545
+ ]
546
+ }
547
+ )
548
+
549
+ Html2rss.feed(config)
550
+ ```
532
551
 
533
552
  </details>
534
553
 
535
- ## Contributing
554
+ <details>
555
+ <summary>YAML: a stylesheet config example</summary>
556
+
557
+ ```yml
558
+ stylesheets:
559
+ - href: "/relative/base/path/to/style.xls"
560
+ media: "all"
561
+ type: "text/xsl"
562
+ - href: "http://example.com/rss.css"
563
+ media: "all"
564
+ type: "text/css"
565
+ feeds:
566
+ # ... omitted
567
+ ```
568
+
569
+ </details>
570
+
571
+ Recommended further readings:
572
+
573
+ - [How to format RSS with CSS on lifewire.com](https://www.lifewire.com/how-to-format-rss-3469302)
574
+ - [XSLT: Extensible Stylesheet Language Transformations on MDN](https://developer.mozilla.org/en-US/docs/Web/XSLT)
575
+ - [The XSLT used by html2rss-web](https://github.com/html2rss/html2rss-web/blob/master/public/rss.xsl)
576
+
577
+ ## Gotchas and tips & tricks
578
+
579
+ - Check that the channel URL does not redirect to a mobile page with a different markup structure.
580
+ - Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript.
581
+ - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).
582
+ - [CSS selectors are versatile. Here's an overview.](https://www.w3.org/TR/selectors-4/#overview)
583
+
584
+ ### Contributing
536
585
 
537
- Bug reports and pull requests are welcome on GitHub at https://github.com/gildesmarais/html2rss.
586
+ 1. Fork it ( <https://github.com/html2rss/html2rss/fork> )
587
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
588
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
589
+ 4. Push to the branch (`git push origin my-new-feature`)
590
+ 5. Create a new Pull Request