selma 0.3.0-x64-mingw-ucrt → 0.4.0-x64-mingw-ucrt

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a0ddf7d69640ad327cf8e8efd8e35e0441f6d4910e98d98629f96254d22268f8
4
- data.tar.gz: 0c1ff8f5237f35d2353b6d78f79a2316ae052631094920cb17ff8dda5062ff56
3
+ metadata.gz: '09223eb3558240eceac68e85d2b1176d5200bfe0e5615f2fac578ce0aa009442'
4
+ data.tar.gz: 2055d3638f4f85f1b8b7257e820c441894fb5ee78185f9caa43334c6465b3406
5
5
  SHA512:
6
- metadata.gz: 94ecfa9b7f1ef2f36a9397880ed901900fc128996d3c79b4ef337cbd7f3e0fbeebcd66dea3e509b29cffb2a21a951f49263648c8485a4954d7ccd5987f532e52
7
- data.tar.gz: 806f0ed6ff134d58e7aa5a63a455ee5e0364a29d77b2169893d1712c9268b43efd5f1f1c1dcb8c98b2092efdd0133846043868680d69b30a55847051ec9b3365
6
+ metadata.gz: 870af5762919e3f45bf4c9386811910802fbc9c2f8dec2447ae84335694978044b117aa00803201a3bfbe680038be99e2a4ee5cd489eb689071a01cec58519e5
7
+ data.tar.gz: dc883f86c62281e50abfb6dccd77ac79dbea6068f2f683b2a37a67ab01d2332e968eaa7abfaa2ce9e7846f7eb6ee98d1215d3e3071997744377010802ecb128e
data/README.md CHANGED
@@ -180,6 +180,19 @@ The `element` argument in `handle_element` has the following methods:
180
180
  - `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
181
181
  - `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
182
182
 
183
+ ## Security
184
+
185
+ Theoretically, a malicious user can provide a very large document for processing, which can exhaust the memory of the host machine. To set a limit on how much string content is processed at once, you can provide two options into the `memory` namespace:
186
+
187
+ ```ruby
188
+ memory: {
189
+ max_allowed_memory_usage: 1000,
190
+ preallocated_parsing_buffer_size: 100,
191
+ },
192
+ ```
193
+
194
+ Note that `preallocated_parsing_buffer_size` must always be less than `max_allowed_memory_usage`. See [the`lol_html` project documentation](https://docs.rs/lol_html/1.2.1/lol_html/struct.MemorySettings.html) to learn more about the default values.
195
+
183
196
  ## Benchmarks
184
197
 
185
198
  When `bundle exec rake benchmark`, two different benchmarks are calculated. Here are those results on my machine.
@@ -191,30 +204,33 @@ Comparing Selma against popular Ruby sanitization gems:
191
204
  <!-- prettier-ignore-start -->
192
205
  <details>
193
206
  <pre>
207
+ input size = 25309 bytes, 0.03 MB
208
+
209
+ ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
194
210
  Warming up --------------------------------------
195
- sanitize-sm 15.000 i/100ms
196
- selma-sm 126.000 i/100ms
211
+ sanitize-sm 16.000 i/100ms
212
+ selma-sm 214.000 i/100ms
197
213
  Calculating -------------------------------------
198
- sanitize-sm 155.074 (± 1.9%) i/s - 4.665k in 30.092214s
199
- selma-sm 1.290k1.3%) i/s - 38.808k in 30.085333s
214
+ sanitize-sm 171.670 (± 1.2%) i/s - 5.152k in 30.017081s
215
+ selma-sm 2.146k3.0%) i/s - 64.414k in 30.058470s
200
216
 
201
217
  Comparison:
202
- selma-sm: 1290.1 i/s
203
- sanitize-sm: 155.1 i/s - 8.32x slower
218
+ selma-sm: 2145.8 i/s
219
+ sanitize-sm: 171.7 i/s - 12.50x slower
204
220
 
205
221
  input size = 86686 bytes, 0.09 MB
206
222
 
207
223
  ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
208
224
  Warming up --------------------------------------
209
- sanitize-md 3.000 i/100ms
210
- selma-md 33.000 i/100ms
225
+ sanitize-md 4.000 i/100ms
226
+ selma-md 56.000 i/100ms
211
227
  Calculating -------------------------------------
212
- sanitize-md 40.3215.0%) i/s - 1.206k in 30.004711s
213
- selma-md 337.417 (± 1.5%) i/s - 10.131k in 30.032772s
228
+ sanitize-md 44.3972.3%) i/s - 1.332k in 30.022430s
229
+ selma-md 558.448 (± 1.4%) i/s - 16.800k in 30.089196s
214
230
 
215
231
  Comparison:
216
- selma-md: 337.4 i/s
217
- sanitize-md: 40.3 i/s - 8.37x slower
232
+ selma-md: 558.4 i/s
233
+ sanitize-md: 44.4 i/s - 12.58x slower
218
234
 
219
235
  input size = 7172510 bytes, 7.17 MB
220
236
 
@@ -223,12 +239,12 @@ Warming up --------------------------------------
223
239
  sanitize-lg 1.000 i/100ms
224
240
  selma-lg 1.000 i/100ms
225
241
  Calculating -------------------------------------
226
- sanitize-lg 0.144 (± 0.0%) i/s - 5.000 in 34.772526s
227
- selma-lg 4.026 (± 0.0%) i/s - 121.000 in 30.067415s
242
+ sanitize-lg 0.163 (± 0.0%) i/s - 6.000 in 37.375628s
243
+ selma-lg 6.750 (± 0.0%) i/s - 203.000 in 30.080976s
228
244
 
229
245
  Comparison:
230
- selma-lg: 4.0 i/s
231
- sanitize-lg: 0.1 i/s - 27.99x slower
246
+ selma-lg: 6.7 i/s
247
+ sanitize-lg: 0.2 i/s - 41.32x slower
232
248
  </pre>
233
249
  </details>
234
250
  <!-- prettier-ignore-end -->
@@ -239,41 +255,39 @@ Comparing Selma against popular Ruby HTML parsing gems:
239
255
 
240
256
  <!-- prettier-ignore-start -->
241
257
  <details>
242
- <pre>
243
-
244
- input size = 25309 bytes, 0.03 MB
258
+ <pre>input size = 25309 bytes, 0.03 MB
245
259
 
246
260
  ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
247
261
  Warming up --------------------------------------
248
- nokogiri-sm 79.000 i/100ms
249
- nokolexbor-sm 285.000 i/100ms
250
- selma-sm 244.000 i/100ms
262
+ nokogiri-sm 107.000 i/100ms
263
+ nokolexbor-sm 340.000 i/100ms
264
+ selma-sm 380.000 i/100ms
251
265
  Calculating -------------------------------------
252
- nokogiri-sm 807.7903.1%) i/s - 24.253k in 30.056301s
253
- nokolexbor-sm 2.880k 6.4%) i/s - 86.070k in 30.044766s
254
- selma-sm 2.508k1.2%) i/s - 75.396k in 30.068792s
266
+ nokogiri-sm 1.073k2.1%) i/s - 32.207k in 30.025474s
267
+ nokolexbor-sm 3.300k13.2%) i/s - 27.540k in 36.788212s
268
+ selma-sm 3.779k3.4%) i/s - 113.240k in 30.013908s
255
269
 
256
270
  Comparison:
257
- nokolexbor-sm: 2880.3 i/s
258
- selma-sm: 2507.8 i/s - 1.15x slower
259
- nokogiri-sm: 807.8 i/s - 3.57x slower
271
+ selma-sm: 3779.4 i/s
272
+ nokolexbor-sm: 3300.1 i/s - same-ish: difference falls within error
273
+ nokogiri-sm: 1073.1 i/s - 3.52x slower
260
274
 
261
275
  input size = 86686 bytes, 0.09 MB
262
276
 
263
277
  ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
264
278
  Warming up --------------------------------------
265
- nokogiri-md 8.000 i/100ms
266
- nokolexbor-md 43.000 i/100ms
267
- selma-md 39.000 i/100ms
279
+ nokogiri-md 11.000 i/100ms
280
+ nokolexbor-md 48.000 i/100ms
281
+ selma-md 53.000 i/100ms
268
282
  Calculating -------------------------------------
269
- nokogiri-md 87.3673.4%) i/s - 2.624k in 30.061642s
270
- nokolexbor-md 438.7823.9%) i/s - 13.158k in 30.031163s
271
- selma-md 392.5913.1%) i/s - 11.778k in 30.031391s
283
+ nokogiri-md 103.9985.8%) i/s - 3.113k in 30.029932s
284
+ nokolexbor-md 428.9287.9%) i/s - 12.816k in 30.066662s
285
+ selma-md 492.1906.9%) i/s - 14.734k in 30.082943s
272
286
 
273
287
  Comparison:
274
- nokolexbor-md: 438.8 i/s
275
- selma-md: 392.6 i/s - 1.12x slower
276
- nokogiri-md: 87.4 i/s - 5.02x slower
288
+ selma-md: 492.2 i/s
289
+ nokolexbor-md: 428.9 i/s - same-ish: difference falls within error
290
+ nokogiri-md: 104.0 i/s - 4.73x slower
277
291
 
278
292
  input size = 7172510 bytes, 7.17 MB
279
293
 
@@ -283,14 +297,14 @@ Warming up --------------------------------------
283
297
  nokolexbor-lg 1.000 i/100ms
284
298
  selma-lg 1.000 i/100ms
285
299
  Calculating -------------------------------------
286
- nokogiri-lg 0.895 (± 0.0%) i/s - 27.000 in 30.300832s
287
- nokolexbor-lg 2.163 (± 0.0%) i/s - 65.000 in 30.085656s
288
- selma-lg 5.867 (± 0.0%) i/s - 176.000 in 30.006240s
300
+ nokogiri-lg 0.874 (± 0.0%) i/s - 27.000 in 30.921090s
301
+ nokolexbor-lg 2.227 (± 0.0%) i/s - 67.000 in 30.137903s
302
+ selma-lg 8.354 (± 0.0%) i/s - 251.000 in 30.075227s
289
303
 
290
304
  Comparison:
291
- selma-lg: 5.9 i/s
292
- nokolexbor-lg: 2.2 i/s - 2.71x slower
293
- nokogiri-lg: 0.9 i/s - 6.55x slower
305
+ selma-lg: 8.4 i/s
306
+ nokolexbor-lg: 2.2 i/s - 3.75x slower
307
+ nokogiri-lg: 0.9 i/s - 9.56x slower
294
308
  </pre>
295
309
  </details>
296
310
  <!-- prettier-ignore-end -->
Binary file
Binary file
Binary file
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Selma
4
+ module Config
5
+ OPTIONS = {
6
+ memory: {
7
+ max_allowed_memory_usage: nil,
8
+ preallocated_parsing_buffer_size: nil,
9
+ },
10
+ }
11
+ end
12
+ end
data/lib/selma/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Selma
4
- VERSION = "0.3.0"
4
+ VERSION = "0.4.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: selma
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.0
5
5
  platform: x64-mingw-ucrt
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2024-06-07 00:00:00.000000000 Z
11
+ date: 2024-07-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
@@ -51,6 +51,7 @@ files:
51
51
  - lib/selma/3.1/selma.so
52
52
  - lib/selma/3.2/selma.so
53
53
  - lib/selma/3.3/selma.so
54
+ - lib/selma/config.rb
54
55
  - lib/selma/extension.rb
55
56
  - lib/selma/html.rb
56
57
  - lib/selma/rewriter.rb