selma 0.2.2-x86_64-linux → 0.4.0-x86_64-linux

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e089640332c2febdf6f1b8ccdd16a8d4677cc4c555ebfbb88e8bd52edd7996d1
4
- data.tar.gz: d0e5a3ad8e39fe5dffe74d265874daa23dd65690201bab76c4771942c3cb1f8e
3
+ metadata.gz: b17c0861a7718a35d41df0fc24fe4d3ae8ecee55247f7f781256b69e4766144b
4
+ data.tar.gz: 72bd55489d4762758e7746e01459f1d37619118942c1e356a0ec95fc729139a9
5
5
  SHA512:
6
- metadata.gz: ee69d1c271e30b9c50f4d091a5fa8e0d864fe1aaf1c3565d90f6da2b0a4405370a453c7a3e6c2bc89b48fbc56974b9963ebff839f2577c42ff9ff814cb476170
7
- data.tar.gz: 3ed6ba8d9472215abed8a8ca840c0f6cd0888be05757a99760b745164bda840d66930f517c29a3a52cba19be376c9c0e34413a8353ca1f4231f0e77ae750894c
6
+ metadata.gz: '07528f3ec41b8e88208d9b5505ba6265402a1116f56c1c2808f7dcbb737660206536a0d70f5b1f3c3b69d149dc1094cca5b098a1007be651cbb9c1bd87749bef'
7
+ data.tar.gz: 264c79f70a165fc594215d8b6e6173ad723af5f0477add404490da4407f7cc2151f8b5faa1d16e4a2cef2d33a0f83fb86821b034596458efb8b277dd3abe7ef2
data/README.md CHANGED
@@ -76,7 +76,7 @@ attributes: {
76
76
 
77
77
  # URL handling protocols to allow in specific attributes. By default, no
78
78
  # protocols are allowed. Use :relative in place of a protocol if you want
79
- # to allow relative URLs sans protocol.
79
+ # to allow relative URLs sans protocol. Set to `:all` to allow any protocol.
80
80
  protocols: {
81
81
  "a" => { "href" => ["http", "https", "mailto", :relative] },
82
82
  "img" => { "href" => ["http", "https"] },
@@ -103,7 +103,11 @@ Here's an example which rewrites the `href` attribute on `a` and the `src` attri
103
103
 
104
104
  ```ruby
105
105
  class MatchAttribute
106
- SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
106
+ SELECTOR = Selma::Selector.new(match_element: %(a[href^="http:"], img[src^="http:"]"))
107
+
108
+ def selector
109
+ SELECTOR
110
+ end
107
111
 
108
112
  def handle_element(element)
109
113
  if element.tag_name == "a"
@@ -176,40 +180,134 @@ The `element` argument in `handle_element` has the following methods:
176
180
  - `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
177
181
  - `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
178
182
 
183
+ ## Security
184
+
185
+ Theoretically, a malicious user can provide a very large document for processing, which can exhaust the memory of the host machine. To set a limit on how much string content is processed at once, you can provide two options into the `memory` namespace:
186
+
187
+ ```ruby
188
+ memory: {
189
+ max_allowed_memory_usage: 1000,
190
+ preallocated_parsing_buffer_size: 100,
191
+ },
192
+ ```
193
+
194
+ Note that `preallocated_parsing_buffer_size` must always be less than `max_allowed_memory_usage`. See [the`lol_html` project documentation](https://docs.rs/lol_html/1.2.1/lol_html/struct.MemorySettings.html) to learn more about the default values.
195
+
179
196
  ## Benchmarks
180
197
 
198
+ When `bundle exec rake benchmark`, two different benchmarks are calculated. Here are those results on my machine.
199
+
200
+ ### Benchmarks for just the sanitization process
201
+
202
+ Comparing Selma against popular Ruby sanitization gems:
203
+
204
+ <!-- prettier-ignore-start -->
181
205
  <details>
182
206
  <pre>
183
- ruby test/benchmark.rb
184
- ruby test/benchmark.rb
207
+ input size = 25309 bytes, 0.03 MB
208
+
209
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
185
210
  Warming up --------------------------------------
186
- sanitize-document-huge
187
- 1.000 i/100ms
188
- selma-document-huge 1.000 i/100ms
211
+ sanitize-sm 16.000 i/100ms
212
+ selma-sm 214.000 i/100ms
189
213
  Calculating -------------------------------------
190
- sanitize-document-huge
191
- 0.257 0.0%) i/s - 2.000 in 7.783398s
192
- selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
214
+ sanitize-sm 171.670 (± 1.2%) i/s - 5.152k in 30.017081s
215
+ selma-sm 2.146k 3.0%) i/s - 64.414k in 30.058470s
216
+
217
+ Comparison:
218
+ selma-sm: 2145.8 i/s
219
+ sanitize-sm: 171.7 i/s - 12.50x slower
220
+
221
+ input size = 86686 bytes, 0.09 MB
222
+
223
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
193
224
  Warming up --------------------------------------
194
- sanitize-document-medium
195
- 2.000 i/100ms
196
- selma-document-medium
197
- 22.000 i/100ms
225
+ sanitize-md 4.000 i/100ms
226
+ selma-md 56.000 i/100ms
198
227
  Calculating -------------------------------------
199
- sanitize-document-medium
200
- 28.676 3.5%) i/s - 144.000 in 5.024669s
201
- selma-document-medium
202
- 121.500 (±22.2%) i/s - 594.000 in 5.135410s
228
+ sanitize-md 44.397 (± 2.3%) i/s - 1.332k in 30.022430s
229
+ selma-md 558.448 1.4%) i/s - 16.800k in 30.089196s
230
+
231
+ Comparison:
232
+ selma-md: 558.4 i/s
233
+ sanitize-md: 44.4 i/s - 12.58x slower
234
+
235
+ input size = 7172510 bytes, 7.17 MB
236
+
237
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
238
+ Warming up --------------------------------------
239
+ sanitize-lg 1.000 i/100ms
240
+ selma-lg 1.000 i/100ms
241
+ Calculating -------------------------------------
242
+ sanitize-lg 0.163 (± 0.0%) i/s - 6.000 in 37.375628s
243
+ selma-lg 6.750 (± 0.0%) i/s - 203.000 in 30.080976s
244
+
245
+ Comparison:
246
+ selma-lg: 6.7 i/s
247
+ sanitize-lg: 0.2 i/s - 41.32x slower
248
+ </pre>
249
+ </details>
250
+ <!-- prettier-ignore-end -->
251
+
252
+ ### Benchmarks for just the rewriting process
253
+
254
+ Comparing Selma against popular Ruby HTML parsing gems:
255
+
256
+ <!-- prettier-ignore-start -->
257
+ <details>
258
+ <pre>input size = 25309 bytes, 0.03 MB
259
+
260
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
261
+ Warming up --------------------------------------
262
+ nokogiri-sm 107.000 i/100ms
263
+ nokolexbor-sm 340.000 i/100ms
264
+ selma-sm 380.000 i/100ms
265
+ Calculating -------------------------------------
266
+ nokogiri-sm 1.073k (± 2.1%) i/s - 32.207k in 30.025474s
267
+ nokolexbor-sm 3.300k (±13.2%) i/s - 27.540k in 36.788212s
268
+ selma-sm 3.779k (± 3.4%) i/s - 113.240k in 30.013908s
269
+
270
+ Comparison:
271
+ selma-sm: 3779.4 i/s
272
+ nokolexbor-sm: 3300.1 i/s - same-ish: difference falls within error
273
+ nokogiri-sm: 1073.1 i/s - 3.52x slower
274
+
275
+ input size = 86686 bytes, 0.09 MB
276
+
277
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
278
+ Warming up --------------------------------------
279
+ nokogiri-md 11.000 i/100ms
280
+ nokolexbor-md 48.000 i/100ms
281
+ selma-md 53.000 i/100ms
282
+ Calculating -------------------------------------
283
+ nokogiri-md 103.998 (± 5.8%) i/s - 3.113k in 30.029932s
284
+ nokolexbor-md 428.928 (± 7.9%) i/s - 12.816k in 30.066662s
285
+ selma-md 492.190 (± 6.9%) i/s - 14.734k in 30.082943s
286
+
287
+ Comparison:
288
+ selma-md: 492.2 i/s
289
+ nokolexbor-md: 428.9 i/s - same-ish: difference falls within error
290
+ nokogiri-md: 104.0 i/s - 4.73x slower
291
+
292
+ input size = 7172510 bytes, 7.17 MB
293
+
294
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
203
295
  Warming up --------------------------------------
204
- sanitize-document-small
205
- 10.000 i/100ms
206
- selma-document-small 20.000 i/100ms
296
+ nokogiri-lg 1.000 i/100ms
297
+ nokolexbor-lg 1.000 i/100ms
298
+ selma-lg 1.000 i/100ms
207
299
  Calculating -------------------------------------
208
- sanitize-document-small
209
- 107.280 (± 0.9%) i/s - 540.000 in 5.033850s
210
- selma-document-small 118.867 31.1%) i/s - 540.000 in 5.080726s
300
+ nokogiri-lg 0.874 (± 0.0%) i/s - 27.000 in 30.921090s
301
+ nokolexbor-lg 2.227 (± 0.0%) i/s - 67.000 in 30.137903s
302
+ selma-lg 8.354 0.0%) i/s - 251.000 in 30.075227s
303
+
304
+ Comparison:
305
+ selma-lg: 8.4 i/s
306
+ nokolexbor-lg: 2.2 i/s - 3.75x slower
307
+ nokogiri-lg: 0.9 i/s - 9.56x slower
211
308
  </pre>
212
309
  </details>
310
+ <!-- prettier-ignore-end -->
213
311
 
214
312
  ## Contributing
215
313
 
Binary file
Binary file
Binary file
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Selma
4
+ module Config
5
+ OPTIONS = {
6
+ memory: {
7
+ max_allowed_memory_usage: nil,
8
+ preallocated_parsing_buffer_size: nil,
9
+ },
10
+ }
11
+ end
12
+ end
@@ -28,7 +28,7 @@ module Selma
28
28
 
29
29
  # URL handling protocols to allow in specific attributes. By default, no
30
30
  # protocols are allowed. Use :relative in place of a protocol if you want
31
- # to allow relative URLs sans protocol.
31
+ # to allow relative URLs sans protocol. Set to `:all` to allow any protocol.
32
32
  protocols: {},
33
33
 
34
34
  # An Array of element names whose contents will be removed. The contents
@@ -16,6 +16,7 @@ module Selma
16
16
  "colgroup",
17
17
  "data",
18
18
  "del",
19
+ "details",
19
20
  "div",
20
21
  "figcaption",
21
22
  "figure",
@@ -66,7 +66,12 @@ module Selma
66
66
  end
67
67
 
68
68
  def allow_protocol(element, attr, protos)
69
- protos = [protos] unless protos.is_a?(Array)
69
+ if protos.is_a?(Array)
70
+ raise ArgumentError, "`:all` must be passed outside of an array" if protos.include?(:all)
71
+ else
72
+ protos = [protos]
73
+ end
74
+
70
75
  set_allowed_protocols(element, attr, protos)
71
76
  end
72
77
 
data/lib/selma/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Selma
4
- VERSION = "0.2.2"
4
+ VERSION = "0.4.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: selma
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.4.0
5
5
  platform: x86_64-linux
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-01-03 00:00:00.000000000 Z
11
+ date: 2024-07-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -51,6 +51,7 @@ files:
51
51
  - lib/selma/3.1/selma.so
52
52
  - lib/selma/3.2/selma.so
53
53
  - lib/selma/3.3/selma.so
54
+ - lib/selma/config.rb
54
55
  - lib/selma/extension.rb
55
56
  - lib/selma/html.rb
56
57
  - lib/selma/rewriter.rb