selma 0.2.2-x86_64-darwin → 0.4.0-x86_64-darwin

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: da44321197848529fd6fb90ec2b6265f17441d380efa47fd5836492d9ef616fb
4
- data.tar.gz: 17e40218b475fda8e86dc160b04b5259ce2978c92ec4677cbf8b0e4db0f82b36
3
+ metadata.gz: 0e44202bd34d77e68a8c2c1af2ef9a1650c7b126a0a0617f76b60fb19f437365
4
+ data.tar.gz: 2b77c9bfaacc1ae7f3af965f187c71ac1661b22205dcddeb6cc27f1a3536db2e
5
5
  SHA512:
6
- metadata.gz: 6f0174d4954d0e4f22fe0e89239bee643ea24f6c9c05652b2ce79935bf30529b74fcb66ba96eb6b4f4fbc56c1f3a44557d77033ee7fc47952deda885430bc6e3
7
- data.tar.gz: eb7c917db6fc9085db53879fac0c5b1cc4843ca37c5931199e1fcf3f8e12da64959cf30fbf276f0bac8cc1da8f665e22c87621a12b15c5c9160f6d5d95867ec7
6
+ metadata.gz: f6de84a3a8c74ccd7d7bd387a11730ce03abdcfde05676d08b66b03903d9a66008aa4c92d73a90e77dcfc215dc2ab2e6921697cdcb15fa630098f44fbb4305d2
7
+ data.tar.gz: a0a3e20a2022c2b9dbaae65003ecbf1800d01b7b4569d8a23691a07f065d102570f12b5bb84dd7738cc8a1038b98fce2cf49e1a9f116304ffe98a93f1d5a42e0
data/README.md CHANGED
@@ -76,7 +76,7 @@ attributes: {
76
76
 
77
77
  # URL handling protocols to allow in specific attributes. By default, no
78
78
  # protocols are allowed. Use :relative in place of a protocol if you want
79
- # to allow relative URLs sans protocol.
79
+ # to allow relative URLs sans protocol. Set to `:all` to allow any protocol.
80
80
  protocols: {
81
81
  "a" => { "href" => ["http", "https", "mailto", :relative] },
82
82
  "img" => { "href" => ["http", "https"] },
@@ -103,7 +103,11 @@ Here's an example which rewrites the `href` attribute on `a` and the `src` attri
103
103
 
104
104
  ```ruby
105
105
  class MatchAttribute
106
- SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
106
+ SELECTOR = Selma::Selector.new(match_element: %(a[href^="http:"], img[src^="http:"]"))
107
+
108
+ def selector
109
+ SELECTOR
110
+ end
107
111
 
108
112
  def handle_element(element)
109
113
  if element.tag_name == "a"
@@ -176,40 +180,134 @@ The `element` argument in `handle_element` has the following methods:
176
180
  - `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
177
181
  - `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
178
182
 
183
+ ## Security
184
+
185
+ Theoretically, a malicious user can provide a very large document for processing, which can exhaust the memory of the host machine. To set a limit on how much string content is processed at once, you can provide two options into the `memory` namespace:
186
+
187
+ ```ruby
188
+ memory: {
189
+ max_allowed_memory_usage: 1000,
190
+ preallocated_parsing_buffer_size: 100,
191
+ },
192
+ ```
193
+
194
+ Note that `preallocated_parsing_buffer_size` must always be less than `max_allowed_memory_usage`. See [the`lol_html` project documentation](https://docs.rs/lol_html/1.2.1/lol_html/struct.MemorySettings.html) to learn more about the default values.
195
+
179
196
  ## Benchmarks
180
197
 
198
+ When `bundle exec rake benchmark`, two different benchmarks are calculated. Here are those results on my machine.
199
+
200
+ ### Benchmarks for just the sanitization process
201
+
202
+ Comparing Selma against popular Ruby sanitization gems:
203
+
204
+ <!-- prettier-ignore-start -->
181
205
  <details>
182
206
  <pre>
183
- ruby test/benchmark.rb
184
- ruby test/benchmark.rb
207
+ input size = 25309 bytes, 0.03 MB
208
+
209
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
185
210
  Warming up --------------------------------------
186
- sanitize-document-huge
187
- 1.000 i/100ms
188
- selma-document-huge 1.000 i/100ms
211
+ sanitize-sm 16.000 i/100ms
212
+ selma-sm 214.000 i/100ms
189
213
  Calculating -------------------------------------
190
- sanitize-document-huge
191
- 0.257 0.0%) i/s - 2.000 in 7.783398s
192
- selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
214
+ sanitize-sm 171.670 (± 1.2%) i/s - 5.152k in 30.017081s
215
+ selma-sm 2.146k 3.0%) i/s - 64.414k in 30.058470s
216
+
217
+ Comparison:
218
+ selma-sm: 2145.8 i/s
219
+ sanitize-sm: 171.7 i/s - 12.50x slower
220
+
221
+ input size = 86686 bytes, 0.09 MB
222
+
223
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
193
224
  Warming up --------------------------------------
194
- sanitize-document-medium
195
- 2.000 i/100ms
196
- selma-document-medium
197
- 22.000 i/100ms
225
+ sanitize-md 4.000 i/100ms
226
+ selma-md 56.000 i/100ms
198
227
  Calculating -------------------------------------
199
- sanitize-document-medium
200
- 28.676 3.5%) i/s - 144.000 in 5.024669s
201
- selma-document-medium
202
- 121.500 (±22.2%) i/s - 594.000 in 5.135410s
228
+ sanitize-md 44.397 (± 2.3%) i/s - 1.332k in 30.022430s
229
+ selma-md 558.448 1.4%) i/s - 16.800k in 30.089196s
230
+
231
+ Comparison:
232
+ selma-md: 558.4 i/s
233
+ sanitize-md: 44.4 i/s - 12.58x slower
234
+
235
+ input size = 7172510 bytes, 7.17 MB
236
+
237
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
238
+ Warming up --------------------------------------
239
+ sanitize-lg 1.000 i/100ms
240
+ selma-lg 1.000 i/100ms
241
+ Calculating -------------------------------------
242
+ sanitize-lg 0.163 (± 0.0%) i/s - 6.000 in 37.375628s
243
+ selma-lg 6.750 (± 0.0%) i/s - 203.000 in 30.080976s
244
+
245
+ Comparison:
246
+ selma-lg: 6.7 i/s
247
+ sanitize-lg: 0.2 i/s - 41.32x slower
248
+ </pre>
249
+ </details>
250
+ <!-- prettier-ignore-end -->
251
+
252
+ ### Benchmarks for just the rewriting process
253
+
254
+ Comparing Selma against popular Ruby HTML parsing gems:
255
+
256
+ <!-- prettier-ignore-start -->
257
+ <details>
258
+ <pre>input size = 25309 bytes, 0.03 MB
259
+
260
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
261
+ Warming up --------------------------------------
262
+ nokogiri-sm 107.000 i/100ms
263
+ nokolexbor-sm 340.000 i/100ms
264
+ selma-sm 380.000 i/100ms
265
+ Calculating -------------------------------------
266
+ nokogiri-sm 1.073k (± 2.1%) i/s - 32.207k in 30.025474s
267
+ nokolexbor-sm 3.300k (±13.2%) i/s - 27.540k in 36.788212s
268
+ selma-sm 3.779k (± 3.4%) i/s - 113.240k in 30.013908s
269
+
270
+ Comparison:
271
+ selma-sm: 3779.4 i/s
272
+ nokolexbor-sm: 3300.1 i/s - same-ish: difference falls within error
273
+ nokogiri-sm: 1073.1 i/s - 3.52x slower
274
+
275
+ input size = 86686 bytes, 0.09 MB
276
+
277
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
278
+ Warming up --------------------------------------
279
+ nokogiri-md 11.000 i/100ms
280
+ nokolexbor-md 48.000 i/100ms
281
+ selma-md 53.000 i/100ms
282
+ Calculating -------------------------------------
283
+ nokogiri-md 103.998 (± 5.8%) i/s - 3.113k in 30.029932s
284
+ nokolexbor-md 428.928 (± 7.9%) i/s - 12.816k in 30.066662s
285
+ selma-md 492.190 (± 6.9%) i/s - 14.734k in 30.082943s
286
+
287
+ Comparison:
288
+ selma-md: 492.2 i/s
289
+ nokolexbor-md: 428.9 i/s - same-ish: difference falls within error
290
+ nokogiri-md: 104.0 i/s - 4.73x slower
291
+
292
+ input size = 7172510 bytes, 7.17 MB
293
+
294
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
203
295
  Warming up --------------------------------------
204
- sanitize-document-small
205
- 10.000 i/100ms
206
- selma-document-small 20.000 i/100ms
296
+ nokogiri-lg 1.000 i/100ms
297
+ nokolexbor-lg 1.000 i/100ms
298
+ selma-lg 1.000 i/100ms
207
299
  Calculating -------------------------------------
208
- sanitize-document-small
209
- 107.280 (± 0.9%) i/s - 540.000 in 5.033850s
210
- selma-document-small 118.867 31.1%) i/s - 540.000 in 5.080726s
300
+ nokogiri-lg 0.874 (± 0.0%) i/s - 27.000 in 30.921090s
301
+ nokolexbor-lg 2.227 (± 0.0%) i/s - 67.000 in 30.137903s
302
+ selma-lg 8.354 0.0%) i/s - 251.000 in 30.075227s
303
+
304
+ Comparison:
305
+ selma-lg: 8.4 i/s
306
+ nokolexbor-lg: 2.2 i/s - 3.75x slower
307
+ nokogiri-lg: 0.9 i/s - 9.56x slower
211
308
  </pre>
212
309
  </details>
310
+ <!-- prettier-ignore-end -->
213
311
 
214
312
  ## Contributing
215
313
 
Binary file
Binary file
Binary file
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Selma
4
+ module Config
5
+ OPTIONS = {
6
+ memory: {
7
+ max_allowed_memory_usage: nil,
8
+ preallocated_parsing_buffer_size: nil,
9
+ },
10
+ }
11
+ end
12
+ end
@@ -28,7 +28,7 @@ module Selma
28
28
 
29
29
  # URL handling protocols to allow in specific attributes. By default, no
30
30
  # protocols are allowed. Use :relative in place of a protocol if you want
31
- # to allow relative URLs sans protocol.
31
+ # to allow relative URLs sans protocol. Set to `:all` to allow any protocol.
32
32
  protocols: {},
33
33
 
34
34
  # An Array of element names whose contents will be removed. The contents
@@ -16,6 +16,7 @@ module Selma
16
16
  "colgroup",
17
17
  "data",
18
18
  "del",
19
+ "details",
19
20
  "div",
20
21
  "figcaption",
21
22
  "figure",
@@ -66,7 +66,12 @@ module Selma
66
66
  end
67
67
 
68
68
  def allow_protocol(element, attr, protos)
69
- protos = [protos] unless protos.is_a?(Array)
69
+ if protos.is_a?(Array)
70
+ raise ArgumentError, "`:all` must be passed outside of an array" if protos.include?(:all)
71
+ else
72
+ protos = [protos]
73
+ end
74
+
70
75
  set_allowed_protocols(element, attr, protos)
71
76
  end
72
77
 
data/lib/selma/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Selma
4
- VERSION = "0.2.2"
4
+ VERSION = "0.4.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: selma
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.4.0
5
5
  platform: x86_64-darwin
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-01-03 00:00:00.000000000 Z
11
+ date: 2024-07-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -51,6 +51,7 @@ files:
51
51
  - lib/selma/3.1/selma.bundle
52
52
  - lib/selma/3.2/selma.bundle
53
53
  - lib/selma/3.3/selma.bundle
54
+ - lib/selma/config.rb
54
55
  - lib/selma/extension.rb
55
56
  - lib/selma/html.rb
56
57
  - lib/selma/rewriter.rb