html2rss 0.8.2 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +1 -1
  3. data/.mergify.yml +15 -0
  4. data/.rubocop.yml +13 -42
  5. data/Gemfile +19 -2
  6. data/Gemfile.lock +116 -94
  7. data/README.md +326 -253
  8. data/bin/console +1 -0
  9. data/exe/html2rss +6 -0
  10. data/html2rss.gemspec +16 -21
  11. data/lib/html2rss/attribute_post_processors/gsub.rb +30 -8
  12. data/lib/html2rss/attribute_post_processors/html_to_markdown.rb +7 -2
  13. data/lib/html2rss/attribute_post_processors/html_transformers/transform_urls_to_absolute_ones.rb +27 -0
  14. data/lib/html2rss/attribute_post_processors/html_transformers/wrap_img_in_a.rb +41 -0
  15. data/lib/html2rss/attribute_post_processors/markdown_to_html.rb +11 -2
  16. data/lib/html2rss/attribute_post_processors/parse_time.rb +11 -4
  17. data/lib/html2rss/attribute_post_processors/parse_uri.rb +12 -2
  18. data/lib/html2rss/attribute_post_processors/sanitize_html.rb +46 -51
  19. data/lib/html2rss/attribute_post_processors/substring.rb +14 -4
  20. data/lib/html2rss/attribute_post_processors/template.rb +36 -12
  21. data/lib/html2rss/attribute_post_processors.rb +28 -5
  22. data/lib/html2rss/cli.rb +29 -0
  23. data/lib/html2rss/config/channel.rb +117 -0
  24. data/lib/html2rss/config/selectors.rb +91 -0
  25. data/lib/html2rss/config.rb +71 -78
  26. data/lib/html2rss/item.rb +118 -40
  27. data/lib/html2rss/item_extractors/attribute.rb +20 -7
  28. data/lib/html2rss/item_extractors/href.rb +20 -4
  29. data/lib/html2rss/item_extractors/html.rb +18 -6
  30. data/lib/html2rss/item_extractors/static.rb +18 -7
  31. data/lib/html2rss/item_extractors/text.rb +17 -5
  32. data/lib/html2rss/item_extractors.rb +75 -9
  33. data/lib/html2rss/object_to_xml_converter.rb +56 -0
  34. data/lib/html2rss/rss_builder/channel.rb +21 -0
  35. data/lib/html2rss/rss_builder/item.rb +83 -0
  36. data/lib/html2rss/rss_builder/stylesheet.rb +37 -0
  37. data/lib/html2rss/rss_builder.rb +96 -0
  38. data/lib/html2rss/utils.rb +94 -19
  39. data/lib/html2rss/version.rb +6 -1
  40. data/lib/html2rss.rb +51 -20
  41. data/rakefile.rb +16 -0
  42. metadata +54 -150
  43. data/.travis.yml +0 -25
  44. data/CHANGELOG.md +0 -210
  45. data/lib/html2rss/feed_builder.rb +0 -75
  46. data/lib/html2rss/item_extractors/current_time.rb +0 -21
  47. data/support/logo.png +0 -0
data/README.md CHANGED
@@ -1,35 +1,51 @@
1
- ![html2rss logo](https://github.com/gildesmarais/html2rss/raw/master/support/logo.png)
1
+ ![html2rss logo](https://github.com/html2rss/html2rss/raw/master/support/logo.png)
2
2
 
3
- [![Build Status](https://travis-ci.org/gildesmarais/html2rss.svg?branch=master)](https://travis-ci.org/gildesmarais/html2rss)
4
- [![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/)
5
- [![Coverage Status](https://coveralls.io/repos/github/gildesmarais/html2rss/badge.svg?branch=master)](https://coveralls.io/github/gildesmarais/html2rss?branch=master)
6
- [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss)
7
- ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png)
3
+ [![Gem Version](https://badge.fury.io/rb/html2rss.svg)](http://rubygems.org/gems/html2rss/) [![Yard Docs](http://img.shields.io/badge/yard-docs-blue.svg)](https://www.rubydoc.info/gems/html2rss) ![Retro Badge: valid RSS](https://validator.w3.org/feed/images/valid-rss-rogers.png) [![](http://img.shields.io/liberapay/goal/gildesmarais.svg?logo=liberapa)](https://liberapay.com/gildesmarais/donate)
8
4
 
9
- **Searching for a ready to use app which serves generated feeds via HTTP?**
10
- [Head over to `html2rss-web`!](https://github.com/gildesmarais/html2rss-web)
5
+ `html2rss` is a Ruby gem that generates RSS 2.0 feeds from a _feed config_.
11
6
 
12
- This Ruby gem builds RSS 2.0 feeds from a _feed config_.
7
+ With the _feed config_, you provide a URL to scrape and CSS selectors for extracting information (like title, URL, etc.). The gem builds the RSS feed accordingly. [Extractors](#using-extractors) and chainable [post processors](#using-post-processors) make information extraction, processing, and sanitizing a breeze. The gem also supports [scraping JSON](#scraping-and-handling-json-responses) responses and [setting HTTP request headers](#set-any-http-header-in-the-request).
13
8
 
14
- With the _feed config_ containing the URL to scrape and
15
- CSS selectors for information extraction (like title, URL, ...) your RSS builds.
16
- [Extractors](#using-extractors) and chain-able [post processors](#using-post-processors)
17
- make information extraction, processing and sanitizing a breeze.
18
- [Scraping JSON](#scraping-and-handling-json-responses) responses and
19
- [setting HTTP request headers](#set-any-http-header-in-the-request) is
20
- supported, too.
9
+ **Looking for a ready-to-use app to serve generated feeds via HTTP?** [Check out `html2rss-web`](https://github.com/html2rss/html2rss-web)!
10
+
11
+ Support the development by sponsoring this project on GitHub. Thank you! 💓
21
12
 
22
13
  ## Installation
23
14
 
24
- | 🤩 Like it? | Star it! ⭐️ |
25
- | ---------------------------------------------: | -------------------- |
26
- | Add this line to your application's `Gemfile`: | `gem 'html2rss'` |
27
- | Then execute: | `bundle` |
28
- | In your code: | `require 'html2rss'` |
15
+ | Install | `gem install html2rss` |
16
+ | ------- | ---------------------- |
17
+ | Usage | `html2rss help` |
18
+
19
+ You can also install it as a dependency in your Ruby project:
20
+
21
+ | 🤩 Like it? | Star it! ⭐️ |
22
+ | -------------------------------: | -------------------- |
23
+ | Add this line to your `Gemfile`: | `gem 'html2rss'` |
24
+ | Then execute: | `bundle` |
25
+ | In your code: | `require 'html2rss'` |
26
+
27
+ ## Generating a feed on the CLI
28
+
29
+ Create a file called `my_config_file.yml` with this example content:
30
+
31
+ ```yml
32
+ channel:
33
+ url: https://stackoverflow.com/questions
34
+ selectors:
35
+ items:
36
+ selector: "#hot-network-questions > ul > li"
37
+ title:
38
+ selector: a
39
+ link:
40
+ selector: a
41
+ extractor: href
42
+ ```
43
+
44
+ Build the RSS with: `html2rss feed ./my_config_file.yml`.
29
45
 
30
- ## Building a feed config
46
+ ## Generating a feed with Ruby
31
47
 
32
- Here's a minimal working example:
48
+ Here's a minimal working example in Ruby:
33
49
 
34
50
  ```ruby
35
51
  require 'html2rss'
@@ -47,54 +63,86 @@ rss =
47
63
  puts rss
48
64
  ```
49
65
 
50
- A _feed config_ consists of a `channel` and a `selectors` Hash.
51
- The contents of both hashes are explained below.
66
+ ## The _feed config_ and its options
67
+
68
+ A _feed config_ consists of a `channel` and a `selectors` hash. The contents of both hashes are explained below.
69
+
70
+ Good to know:
52
71
 
53
- **Looks too complicated?** See [`html2rss-configs`](https://github.com/gildesmarais/html2rss-configs) for ready-made feed configs!
72
+ - You'll find extensive example feed configs at [`spec/*.test.yml`](https://github.com/html2rss/html2rss/tree/master/spec).
73
+ - See [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) for ready-made feed configs!
74
+ - If you've created feed configs, you're invited to send a PR to [`html2rss-configs`](https://github.com/html2rss/html2rss-configs) to make your config available to the public.
75
+
76
+ Alright, let's move on.
54
77
 
55
78
  ### The `channel`
56
79
 
57
- | attribute | | type | default | remark |
58
- | ------------- | -------- | ------- | -------------: | ------------------------------------------ |
59
- | `url` | required | String | | |
60
- | `title` | optional | String | auto-generated | |
61
- | `description` | optional | String | auto-generated | |
62
- | `ttl` | optional | Integer | `360` | TTL in _minutes_ |
63
- | `time_zone` | optional | String | `'UTC'` | TimeZone name |
64
- | `language` | optional | String | `'en'` | Language code |
65
- | `author` | optional | String | | Format: `email (Name)'` |
66
- | `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. |
67
- | `json` | optional | Boolean | `false` | Handle JSON response. See notes below. |
80
+ | attribute | | type | default | remark |
81
+ | ------------- | ------------ | ------- | -------------- | ------------------------------------------ |
82
+ | `url` | **required** | String | | |
83
+ | `title` | optional | String | auto-generated | |
84
+ | `description` | optional | String | auto-generated | |
85
+ | `ttl` | optional | Integer | `360` | TTL in _minutes_ |
86
+ | `time_zone` | optional | String | `'UTC'` | TimeZone name |
87
+ | `language` | optional | String | `'en'` | Language code |
88
+ | `author` | optional | String | | Format: `email (Name)` |
89
+ | `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. |
90
+ | `json` | optional | Boolean | `false` | Handle JSON response. See notes below. |
91
+
92
+ #### Dynamic parameters in `channel` attributes
93
+
94
+ Sometimes there are structurally similar pages with different URLs. In such cases, you can add _dynamic parameters_ to the channel's attributes.
95
+
96
+ Example of a dynamic `id` parameter in the channel URLs:
97
+
98
+ ```yml
99
+ channel:
100
+ url: "http://domainname.tld/whatever/%<id>s.html"
101
+ ```
102
+
103
+ Command line usage example:
104
+
105
+ ```sh
106
+ bundle exec html2rss feed the_feed_config.yml id=42
107
+ ```
108
+
109
+ <details><summary>See a Ruby example</summary>
110
+
111
+ ```ruby
112
+ config = Html2rss::Config.new({ channel: { url: 'http://domainname.tld/whatever/%<id>s.html' } }, {}, { id: 42 })
113
+ Html2rss.feed(config)
114
+ ```
115
+
116
+ </details>
117
+
118
+ See the more complex formatting options of the [`sprintf` method](https://ruby-doc.org/core/Kernel.html#method-i-sprintf).
68
119
 
69
120
  ### The `selectors`
70
121
 
71
- You must provide an `items` selector hash which contains the CSS selector.
72
- `items` needs to return a collection of HTML tags.
73
- The other selectors are scoped to the tags of the items' collection.
74
-
75
- To build a
76
- [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item)
77
- each item has to have at least a `title` or a `description`.
78
-
79
- Your `selectors` can contain arbitrary selector names, but only these
80
- will make it into the RSS feed:
81
-
82
- | RSS 2.0 tag | name in `html2rss` | remark |
83
- | ------------- | ------------------ | --------------------------- |
84
- | `title` | `title` | |
85
- | `description` | `description` | Supports HTML. |
86
- | `link` | `link` | A URL. |
87
- | `author` | `author` | |
88
- | `category` | `categories` | See notes below. |
89
- | `enclosure` | `enclosure` | See notes below. |
90
- | `pubDate` | `update` | An instance of `Time`. |
91
- | `guid` | `guid` | Generated from the `title`. |
92
- | `comments` | `comments` | A URL. |
93
- | `source` | ~~source~~ | Not yet supported. |
122
+ First, you must give an **`items`** selector hash, which contains a CSS selector. The selector selects a collection of HTML tags from which the RSS feed items are built. Except for the `items` selector, all other keys are scoped to each item of the collection.
123
+
124
+ To build a [valid RSS 2.0 item](http://www.rssboard.org/rss-profile#element-channel-item), you need at least a `title` **or** a `description`. You can have both.
125
+
126
+ Having an `items` and a `title` selector is enough to build a simple feed.
127
+
128
+ Your `selectors` hash can contain arbitrary named selectors, but only a few will make it into the RSS feed (due to the RSS 2.0 specification):
129
+
130
+ | RSS 2.0 tag | name in `html2rss` | remark |
131
+ | ------------- | ------------------ | ------------------------------------------- |
132
+ | `title` | `title` | |
133
+ | `description` | `description` | Supports HTML. |
134
+ | `link` | `link` | A URL. |
135
+ | `author` | `author` | |
136
+ | `category` | `categories` | See notes below. |
137
+ | `guid` | `guid` | Default title/description. See notes below. |
138
+ | `enclosure` | `enclosure` | See notes below. |
139
+ | `pubDate` | `updated` | An instance of `Time`. |
140
+ | `comments` | `comments` | A URL. |
141
+ | `source` | ~~source~~ | Not yet supported. |
94
142
 
95
143
  ### The `selector` hash
96
144
 
97
- Your selector hash can have these attributes:
145
+ Every named selector in your `selectors` hash can have these attributes:
98
146
 
99
147
  | name | value |
100
148
  | -------------- | -------------------------------------------------------- |
@@ -111,13 +159,11 @@ Extractors help with extracting the information from the selected HTML tag.
111
159
  - The `href` extractor returns a URL from the tag's `href` attribute and corrects relative ones to absolute ones.
112
160
  - The `attribute` extractor returns the value of that tag's attribute.
113
161
  - The `static` extractor returns the configured static value (it doesn't extract anything).
114
- - [See file list of extractors](https://github.com/gildesmarais/html2rss/tree/master/lib/html2rss/item_extractors).
162
+ - [See file list of extractors](https://github.com/html2rss/html2rss/tree/master/lib/html2rss/item_extractors).
115
163
 
116
- Extractors can require additional attributes on the selector hash.
117
- 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/ItemExtractors).
164
+ Extractors might need extra attributes on the selector hash. 👉 [Read their docs for usage examples](https://www.rubydoc.info/gems/html2rss/Html2rss/ItemExtractors).
118
165
 
119
- <details>
120
- <summary>See a Ruby example</summary>
166
+ <details><summary>See a Ruby example</summary>
121
167
 
122
168
  ```ruby
123
169
  Html2rss.feed(
@@ -127,17 +173,16 @@ Html2rss.feed(
127
173
 
128
174
  </details>
129
175
 
130
- <details>
131
- <summary>See a YAML feed config example</summary>
176
+ <details><summary>See a YAML feed config example</summary>
132
177
 
133
178
  ```yml
134
179
  channel:
135
-   # ... omitted
180
+ # ... omitted
136
181
  selectors:
137
-   # ... omitted
182
+ # ... omitted
138
183
  link:
139
- selector: 'a'
140
- extractor: 'href'
184
+ selector: "a"
185
+ extractor: "href"
141
186
  ```
142
187
 
143
188
  </details>
@@ -159,48 +204,11 @@ Extracted information can be further manipulated with post processors.
159
204
 
160
205
  ⚠️ Always make use of the `sanitize_html` post processor for HTML content. _Never trust the internet!_ ⚠️
161
206
 
162
- - [See file list of post processors](https://github.com/gildesmarais/html2rss/tree/master/lib/html2rss/attribute_post_processors).
163
-
164
- 👉 [Read their docs for usage examples.](https://www.rubydoc.info/gems/html2rss/Html2rss/AttributePostProcessors)
165
-
166
- <details>
167
- <summary>See a Ruby example</summary>
168
-
169
- ```ruby
170
- Html2rss.feed(
171
- channel: {},
172
- selectors: {
173
- description: {
174
- selector: '.content', post_process: { name: 'sanitize_html' }
175
- }
176
- }
177
- )
178
- ```
179
-
180
- </details>
181
-
182
- <details>
183
- <summary>See a YAML feed config example</summary>
184
-
185
- ```yml
186
- channel:
187
-   # ... omitted
188
- selectors:
189
-   # ... omitted
190
- description:
191
- selector: '.content'
192
- post_process:
193
- - name: sanitize_html
194
- ```
195
-
196
- </details>
197
-
198
207
  ### Chaining post processors
199
208
 
200
209
  Pass an array to `post_process` to chain the post processors.
201
210
 
202
- <details>
203
- <summary>YAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML</summary>
211
+ <details><summary>YAML example: build the description from a template String (in Markdown) and convert that Markdown to HTML</summary>
204
212
 
205
213
  ```yml
206
214
  channel:
@@ -220,7 +228,44 @@ selectors:
220
228
  - name: markdown_to_html
221
229
  ```
222
230
 
223
- Note the use of `|` for a multi-line String in YAML.
231
+ </details>
232
+
233
+ ### Post processor `gsub`
234
+
235
+ The post processor `gsub` makes use of Ruby's [`gsub`](https://apidock.com/ruby/String/gsub) method.
236
+
237
+ | key | type | required | note |
238
+ | ------------- | ------ | -------- | --------------------------- |
239
+ | `pattern` | String | yes | Can be Regexp or String. |
240
+ | `replacement` | String | yes | Can be a [backreference](). |
241
+
242
+ <details><summary>See a Ruby example</summary>
243
+
244
+ ```ruby
245
+ Html2rss.feed(
246
+ channel: {},
247
+ selectors: {
248
+ title: { selector: 'a', post_process: [{ name: 'gsub', pattern: 'foo', replacement: 'bar' }] }
249
+ }
250
+ )
251
+ ```
252
+
253
+ </details>
254
+
255
+ <details><summary>See a YAML feed config example</summary>
256
+
257
+ ```yml
258
+ channel:
259
+ # ... omitted
260
+ selectors:
261
+ # ... omitted
262
+ title:
263
+ selector: "a"
264
+ post_process:
265
+ - name: "gsub"
266
+ pattern: "foo"
267
+ replacement: "bar"
268
+ ```
224
269
 
225
270
  </details>
226
271
 
@@ -267,65 +312,74 @@ selectors:
267
312
 
268
313
  </details>
269
314
 
270
- ## Adding an `<enclosure>` tag to an item
315
+ ## Custom item GUID
271
316
 
272
- An enclosure can be any file, e.g. a image, audio or video.
317
+ By default, html2rss generates a GUID from the `title` or `description`.
273
318
 
274
- The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.
319
+ If this does not work well, you can choose other attributes from which the GUID is build.
320
+ The principle is the same as for the categories: pass an array of selectors names.
275
321
 
276
- Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:
322
+ In all cases, the GUID is a SHA1-encoded string.
277
323
 
278
- 1. The content-type is guessed from the file extension of the URL.
279
- 2. If the content-type guessing fails, it will default to `application/octet-stream`.
280
- 3. The content-length will always be undetermined and thus stated as `0` bytes.
281
-
282
- Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.
283
-
284
- <details>
285
- <summary>See a Ruby example</summary>
324
+ <details><summary>See a Ruby example</summary>
286
325
 
287
326
  ```ruby
288
327
  Html2rss.feed(
289
328
  channel: {},
290
329
  selectors: {
291
- enclosure: { selector: 'img', extractor: 'attribute', attribute: 'src' }
330
+ title: {
331
+ # ... omitted
332
+ selector: 'h1'
333
+ },
334
+ link: { selector: 'a', extractor: 'href' },
335
+ guid: %i[link]
292
336
  }
293
337
  )
294
338
  ```
295
339
 
296
340
  </details>
297
341
 
298
- <details>
299
- <summary>See a YAML feed config example</summary>
342
+ <details><summary>See a YAML feed config example</summary>
300
343
 
301
344
  ```yml
302
345
  channel:
303
346
    # ... omitted
304
347
  selectors:
305
-   # ... omitted
306
- enclosure:
307
- selector: "img"
308
- extractor: "attribute"
309
- attribute: "src"
348
+ # ... omitted
349
+ title:
350
+ selector: "h1"
351
+ link:
352
+ selector: "a"
353
+ extractor: "href"
354
+ guid:
355
+ - link
310
356
  ```
311
357
 
312
358
  </details>
313
359
 
314
- ## Scraping and handling JSON responses
360
+ ## Adding an `<enclosure>` tag to an item
361
+
362
+ An enclosure can be any file, e.g. a image, audio or video - think Podcast.
363
+
364
+ The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base.
315
365
 
316
- Although this gem is called **html**​*2rss*, it's possible to scrape and process JSON.
366
+ Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs:
317
367
 
318
- Adding `json: true` to the channel config will convert the JSON response to XML.
368
+ 1. The content-type is guessed from the file extension of the URL.
369
+ 2. If the content-type guessing fails, it will default to `application/octet-stream`.
370
+ 3. The content-length will always be undetermined and therefore stated as `0` bytes.
371
+
372
+ Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content.
319
373
 
320
374
  <details>
321
375
  <summary>See a Ruby example</summary>
322
376
 
323
377
  ```ruby
324
378
  Html2rss.feed(
325
- channel: {
326
- url: 'https://example.com', title: 'Example with JSON', json: true
327
- },
328
- selectors: {} # ... omitted
379
+ channel: {},
380
+ selectors: {
381
+ enclosure: { selector: 'audio', extractor: 'attribute', attribute: 'src' }
382
+ }
329
383
  )
330
384
  ```
331
385
 
@@ -334,133 +388,88 @@ Html2rss.feed(
334
388
  <details>
335
389
  <summary>See a YAML feed config example</summary>
336
390
 
337
- ```yaml
391
+ ```yml
338
392
  channel:
339
- url: https://example.com
340
- title: "Example with JSON"
341
- json: true
393
+   # ... omitted
342
394
  selectors:
343
395
    # ... omitted
396
+ enclosure:
397
+ selector: "audio"
398
+ extractor: "attribute"
399
+ attribute: "src"
344
400
  ```
345
401
 
346
402
  </details>
403
+ ## Scraping and handling JSON responses
347
404
 
348
- <details>
349
- <summary>See example of a converted JSON object</summary>
405
+ By default, `html2rss` assumes the URL responds with HTML. However, it can also handle JSON responses. The JSON must return an Array or Hash.
350
406
 
351
- This JSON object:
407
+ | key | required | default | note |
408
+ | ---------- | -------- | ------- | ---------------------------------------------------- |
409
+ | `json` | optional | false | If set to `true`, the response is parsed as JSON. |
410
+ | `jsonpath` | optional | $ | Use [JSONPath syntax]() to select nodes of interest. |
352
411
 
353
- ```json
354
- {
355
- "data": [{ "title": "Headline", "url": "https://example.com" }]
356
- }
357
- ```
412
+ <details><summary>See a Ruby example</summary>
358
413
 
359
- converts to:
360
-
361
- ```xml
362
- <hash>
363
- <data>
364
- <datum>
365
- <title>Headline</title>
366
- <url>https://example.com</url>
367
- </datum>
368
- </data>
369
- </hash>
414
+ ```ruby
415
+ Html2rss.feed(
416
+ channel: { url: 'http://domainname.tld/whatever.json', json: true },
417
+ selectors: { title: { selector: 'foo' } }
418
+ )
370
419
  ```
371
420
 
372
- Your items selector would be `data > datum`, the item's `link` selector would be `url`.
373
-
374
- Find further information in [ActiveSupport's `Hash.to_xml` documentation](https://apidock.com/rails/Hash/to_xml).
375
-
376
421
  </details>
377
422
 
378
- <details>
379
- <summary>See example of a converted JSON array</summary>
380
-
381
- This JSON array:
423
+ <details><summary>See a YAML feed config example</summary>
382
424
 
383
- ```json
384
- [{ "title": "Headline", "url": "https://example.com" }]
385
- ```
386
-
387
- converts to:
388
-
389
- ```xml
390
- <objects>
391
- <object>
392
- <title>Headline</title>
393
- <url>https://example.com</url>
394
- </object>
395
- </objects>
425
+ ```yml
426
+ channel:
427
+ url: "http://domainname.tld/whatever.json"
428
+ json: true
429
+ selectors:
430
+ title:
431
+ selector: "foo"
396
432
  ```
397
433
 
398
- Your items selector would be `objects > object`, the item's `link` selector would be `url`.
399
-
400
- Find further information in [ActiveSupport's `Array.to_xml` documentation](https://apidock.com/rails/Array/to_xml).
401
-
402
434
  </details>
403
435
 
404
436
  ## Set any HTTP header in the request
405
437
 
406
- You can add any HTTP headers to the request to the channel URL.
407
- Use this to e.g. have Cookie or Authorization information sent or to spoof the User-Agent.
438
+ To set HTTP request headers, you can add them to the channel's `headers` hash. This is useful for APIs that require an Authorization header.
408
439
 
409
- <details>
410
- <summary>See a Ruby example</summary>
411
-
412
- ```ruby
413
- Html2rss.feed(
414
- channel: {
415
- url: 'https://example.com',
416
- title: "Example with http headers",
417
- headers: {
418
- "User-Agent": "html2rss-request",
419
- "X-Something": "Foobar",
420
- "Authorization": "Token deadbea7",
421
- "Cookie": "monster=MeWantCookie"
422
- }
423
- },
424
- selectors: {}
425
- )
426
- ```
427
-
428
- </details>
429
-
430
- <details>
431
- <summary>See a YAML feed config example</summary>
432
-
433
- ```yaml
440
+ ```yml
434
441
  channel:
435
- url: https://example.com
436
- title: "Example with http headers"
442
+ url: "https://example.com/api/resource"
437
443
  headers:
438
- "User-Agent": "html2rss-request"
439
- "X-Something": "Foobar"
440
- "Authorization": "Token deadbea7"
441
- "Cookie": "monster=MeWantCookie"
444
+ Authorization: "Bearer YOUR_TOKEN"
442
445
  selectors:
443
-   # ...
446
+ # ... omitted
444
447
  ```
445
448
 
446
- </details>
449
+ Or for setting a User-Agent:
447
450
 
448
- The headers provided by the channel are merged into the global headers.
451
+ ```yml
452
+ channel:
453
+ url: "https://example.com"
454
+ headers:
455
+ User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
456
+ selectors:
457
+ # ... omitted
458
+ ```
449
459
 
450
460
  ## Usage with a YAML config file
451
461
 
452
462
  This step is not required to work with this gem. If you're using
453
- [`html2rss-web`](https://github.com/gildesmarais/html2rss-web)
463
+ [`html2rss-web`](https://github.com/html2rss/html2rss-web)
454
464
  and want to create your private feed configs, keep on reading!
455
465
 
456
- First, create your YAML file, e.g. called `config.yml`.
457
- This file will contain your global config and feed configs.
466
+ First, create a YAML file, e.g. `feeds.yml`. This file will contain your global config and multiple feed configs under the key `feeds`.
458
467
 
459
468
  Example:
460
469
 
461
470
  ```yml
462
471
  headers:
463
- 'User-Agent': "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
472
+ "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"
464
473
  feeds:
465
474
  myfeed:
466
475
  channel:
@@ -472,46 +481,110 @@ feeds:
472
481
 
473
482
  Your feed configs go below `feeds`. Everything else is part of the global config.
474
483
 
475
- Build your feeds like this:
484
+ Find a full example of a `feeds.yml` at [`spec/feeds.test.yml`](https://github.com/html2rss/html2rss/blob/master/spec/feeds.test.yml).
485
+
486
+ Now you can build your feeds like this:
487
+
488
+ <details>
489
+ <summary>Build feeds in Ruby</summary>
476
490
 
477
491
  ```ruby
478
492
  require 'html2rss'
479
493
 
480
- myfeed = Html2rss.feed_from_yaml_config('config.yml', 'myfeed')
481
- myotherfeed = Html2rss.feed_from_yaml_config('config.yml', 'myotherfeed')
494
+ myfeed = Html2rss.feed_from_yaml_config('feeds.yml', 'myfeed')
495
+ myotherfeed = Html2rss.feed_from_yaml_config('feeds.yml', 'myotherfeed')
482
496
  ```
483
497
 
484
- Find a full example of a `config.yml` at [`spec/config.test.yml`](https://github.com/gildesmarais/html2rss/blob/master/spec/config.test.yml).
498
+ </details>
485
499
 
486
- ## Gotchas and tips & tricks
500
+ <details>
501
+ <summary>Build feeds on the command line</summary>
487
502
 
488
- - Check that the channel URL does not redirect to a mobile page with a different markup structure.
489
- - Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript.
490
- - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).
491
- - [CSS selectors are quite versatile, here's an overview.](https://www.w3.org/TR/selectors-4/#overview)
503
+ ```sh
504
+ html2rss feed feeds.yml myfeed
505
+ html2rss feed feeds.yml myotherfeed
506
+ ```
507
+
508
+ </details>
509
+
510
+ ## Display the RSS feed nicely in a web browser
511
+
512
+ To display RSS feeds nicely in a web browser, you can:
492
513
 
493
- ## Development
514
+ - add a plain old CSS stylesheet, or
515
+ - use XSLT (e**X**tensible **S**tylesheet **L**anguage **T**ransformations).
494
516
 
495
- After checking out the repository, run `bin/setup` to install dependencies. Then, run `bundle exec rspec` to run the tests.
496
- You can also run `bin/console` for an interactive prompt that will allow you to experiment.
517
+ A web browser will apply these stylesheets and show the contents as described.
518
+
519
+ In a CSS stylesheet, you'd use `element` selectors to apply styles.
520
+
521
+ If you want to do more, then you need to create a XSLT. XSLT allows you
522
+ to use a HTML template and to freely design the information of the RSS,
523
+ including using JavaScript and external resources.
524
+
525
+ You can add as many stylesheets and types as you like. Just add them to your global configuration.
497
526
 
498
527
  <details>
499
- <summary>Releasing a new version</summary>
500
-
501
- 1. `git pull`
502
- 2. increase version in `lib/html2rss/version.rb`
503
- 3. `bundle`
504
- 4. `git add Gemfile.lock lib/html2rss/version.rb`
505
- 5. `VERSION=$(ruby -e 'require "./lib/html2rss/version.rb"; puts Html2rss::VERSION')`
506
- 6. `git commit -m "chore: release $VERSION"`
507
- 7. `git tag v$VERSION`
508
- 8. [`standard-changelog -f`](https://github.com/conventional-changelog/conventional-changelog/tree/master/packages/standard-changelog)
509
- 9. `git add CHANGELOG.md && git commit --amend`
510
- 10. `git tag v$VERSION -f`
511
- 11. `git push && git push --tags`
528
+ <summary>Ruby: a stylesheet config example</summary>
529
+
530
+ ```ruby
531
+ config = Html2rss::Config.new(
532
+ { channel: {}, selectors: {} }, # omitted
533
+ {
534
+ stylesheets: [
535
+ {
536
+ href: '/relative/base/path/to/style.xls',
537
+ media: :all,
538
+ type: 'text/xsl'
539
+ },
540
+ {
541
+ href: 'http://example.com/rss.css',
542
+ media: :all,
543
+ type: 'text/css'
544
+ }
545
+ ]
546
+ }
547
+ )
548
+
549
+ Html2rss.feed(config)
550
+ ```
512
551
 
513
552
  </details>
514
553
 
515
- ## Contributing
554
+ <details>
555
+ <summary>YAML: a stylesheet config example</summary>
556
+
557
+ ```yml
558
+ stylesheets:
559
+ - href: "/relative/base/path/to/style.xls"
560
+ media: "all"
561
+ type: "text/xsl"
562
+ - href: "http://example.com/rss.css"
563
+ media: "all"
564
+ type: "text/css"
565
+ feeds:
566
+ # ... omitted
567
+ ```
568
+
569
+ </details>
570
+
571
+ Recommended further readings:
572
+
573
+ - [How to format RSS with CSS on lifewire.com](https://www.lifewire.com/how-to-format-rss-3469302)
574
+ - [XSLT: Extensible Stylesheet Language Transformations on MDN](https://developer.mozilla.org/en-US/docs/Web/XSLT)
575
+ - [The XSLT used by html2rss-web](https://github.com/html2rss/html2rss-web/blob/master/public/rss.xsl)
576
+
577
+ ## Gotchas and tips & tricks
578
+
579
+ - Check that the channel URL does not redirect to a mobile page with a different markup structure.
580
+ - Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript.
581
+ - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`).
582
+ - [CSS selectors are versatile. Here's an overview.](https://www.w3.org/TR/selectors-4/#overview)
583
+
584
+ ### Contributing
516
585
 
517
- Bug reports and pull requests are welcome on GitHub at https://github.com/gildesmarais/html2rss.
586
+ 1. Fork it ( <https://github.com/html2rss/html2rss/fork> )
587
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
588
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
589
+ 4. Push to the branch (`git push origin my-new-feature`)
590
+ 5. Create a new Pull Request