selma 0.0.5-arm64-darwin → 0.0.6-arm64-darwin

Sign up to get free protection for your applications and to get access to all the features.
Files changed (4) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +69 -22
  3. data/lib/selma/version.rb +1 -1
  4. metadata +2 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6a8ae1537672de0e7cca2dcecf7c19d470e9f481130fa9c3dad37eef2bc91507
4
- data.tar.gz: 4d897c25589773b5a5bcf987d7b420aa82dd69bcb4d832686f87ea14e40610a7
3
+ metadata.gz: 9efde5c7c8e6fcb8474c6240bb5ff279168346d102a99cff0d46b40a3c5b2581
4
+ data.tar.gz: 175ba5ec1b7364be0021830e26b3463d94dcedad94fc42711384c33535ba4a58
5
5
  SHA512:
6
- metadata.gz: 183446b3cf5e97ef5f61e96da4db1198a8468f7971d6ae61dd9f0130cd019bde29b7975c1a46ff822b055543ed3657597bd02dc0c0a9234e24cefaa93e3e39fb
7
- data.tar.gz: d105ad92ef51135b6d8fe675febe539eb41368bde7f08e10aba9f118d6c8950328d766c874202f759becb1ae75236ebf4e91c756ea1ba12c688941742801401d
6
+ metadata.gz: ba9fc3c87b940dd08a432b3a415e6f0c764575b626f308c52b7827de61fab067c814ca737b4ca7d42a395da4a5fa27539e2c10921bbd75334b5d43d6dda3ba8e
7
+ data.tar.gz: f49a373bb5d0f0758a4473f6a39fcd6b4b1a8e569378cbb119f8d0bfc12bb036e7d96b0c9089e22732c84f1a2d7ee903ed0322105db7668342b84632e5b6034c
data/README.md CHANGED
@@ -24,23 +24,27 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
- Selma can perform two different actions:
27
+ Selma can perform two different actions, either independently or together:
28
28
 
29
29
  - Sanitize HTML, through a [Sanitize](https://github.com/rgrove/sanitize)-like allowlist syntax; and
30
- - Select HTML using CSS rules, and manipulate elements and text
30
+ - Select HTML using CSS rules, and manipulate elements and text nodes along the way.
31
31
 
32
- The basic API for Selma looks like this:
32
+ It does this through two kwargsL `sanitizer` and `handlers`. The basic API for Selma looks like this:
33
33
 
34
34
  ```ruby
35
- rewriter = Selma::Rewriter.new(sanitizer: sanitizer_config, handlers: [MatchAttribute.new, TextRewrite.new])
35
+ sanitizer_config = {
36
+ elements: ["b", "em", "i", "strong", "u"],
37
+ }
38
+ sanitizer = Selma::Sanitizer.new(sanitizer_config)
39
+ rewriter = Selma::Rewriter.new(sanitizer: sanitizer, handlers: [MatchElementRewrite.new, MatchTextRewrite.new])
36
40
  rewriter(html)
37
41
  ```
38
42
 
39
- Let's take a look at each part individually.
43
+ Here's a look at each individual part.
40
44
 
41
45
  ### Sanitization config
42
46
 
43
- Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you want to disable HTML sanitization (for some reason), pass `nil`:
47
+ Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you truly want to disable HTML sanitization (for some reason), pass `nil`:
44
48
 
45
49
  ```ruby
46
50
  Selma::Rewriter.new(sanitizer: nil) # dangerous and ill-advised
@@ -87,22 +91,22 @@ whitespace_elements: ["blockquote", "h1", "h2", "h3", "h4", "h5", "h6", ]
87
91
 
88
92
  ### Defining handlers
89
93
 
90
- The real power in Selma comes in its use of handlers. A handler is simply an object with various methods:
94
+ The real power in Selma comes in its use of handlers. A handler is simply an object with various methods defined:
91
95
 
92
96
  - `selector`, a method which MUST return instance of `Selma::Selector` which defines the CSS classes to match
93
97
  - `handle_element`, a method that's call on each matched element
94
- - `handle_text_chunk`, a method that's called on each matched text node; this MUST return a string
98
+ - `handle_text_chunk`, a method that's called on each matched text node
95
99
 
96
100
  Here's an example which rewrites the `href` attribute on `a` and the `src` attribute on `img` to be `https` rather than `http`.
97
101
 
98
102
  ```ruby
99
103
  class MatchAttribute
100
- SELECTOR = Selma::Selector(match_element: "a, img")
104
+ SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
101
105
 
102
106
  def handle_element(element)
103
- if element.tag_name == "a" && element["href"] =~ /^http:/
107
+ if element.tag_name == "a"
104
108
  element["href"] = rename_http(element["href"])
105
- elsif element.tag_name == "img" && element["src"] =~ /^http:/
109
+ elsif element.tag_name == "img"
106
110
  element["src"] = rename_http(element["src"])
107
111
  end
108
112
  end
@@ -118,10 +122,10 @@ rewriter = Selma::Rewriter.new(handlers: [MatchAttribute.new])
118
122
  The `Selma::Selector` object has three possible kwargs:
119
123
 
120
124
  - `match_element`: any element which matches this CSS rule will be passed on to `handle_element`
121
- - `match_text_within`: any element which matches this CSS rule will be passed on to `handle_text_chunk`
125
+ - `match_text_within`: any text_chunk which matches this CSS rule will be passed on to `handle_text_chunk`
122
126
  - `ignore_text_within`: this is an array of element names whose text contents will be ignored
123
127
 
124
- You've seen an example of `match_element`; here's one for `match_text` which changes strings in various elements which are _not_ `pre` or `code`:
128
+ Here's an example for `handle_text_chunk` which changes strings in various elements which are _not_ `pre` or `code`:
125
129
 
126
130
  ```ruby
127
131
 
@@ -144,20 +148,63 @@ rewriter = Selma::Rewriter.new(handlers: [MatchText.new])
144
148
 
145
149
  The `element` argument in `handle_element` has the following methods:
146
150
 
147
- - `tag_name`: The element's name
148
- - `[]`: get an attribute
149
- - `[]=`: set an attribute
150
- - `remove_attribute`: remove an attribute
151
- - `attributes`: list all the attributes
152
- - `ancestors`: list all the ancestors
153
- - `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
151
+ - `tag_name`: Gets the element's name
152
+ - `tag_name=`: Sets the element's name
153
+ - `self_closing?`: A bool which identifies whether or not the element is self-closing
154
+ - `[]`: Get an attribute
155
+ - `[]=`: Set an attribute
156
+ - `remove_attribute`: Remove an attribute
157
+ - `has_attribute?`: A bool which identifies whether or not the element has an attribute
158
+ - `attributes`: List all the attributes
159
+ - `ancestors`: List all of an element's ancestors as an array of strings
154
160
  - `before(content, as: content_type)`: Inserts `content` before the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
155
161
  - `after(content, as: content_type)`: Inserts `content` after the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
156
- - `set_inner_content`: replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
162
+ - `prepend(content, as: content_type)`: prepends `content` to the element's inner content, i.e. inserts content right after the element's start tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
163
+ - `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
164
+ - `set_inner_content`: Replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
165
+
166
+ #### `text_chunk` methods
167
+
168
+ - `to_s` / `.content`: Gets the text node's content
169
+ - `text_type`: identifies the type of text in the text node
170
+ - `before(content, as: content_type)`: Inserts `content` before the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
171
+ - `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
172
+ - `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
157
173
 
158
174
  ## Benchmarks
159
175
 
160
- TBD
176
+ <details>
177
+ <pre>
178
+ ruby test/benchmark.rb
179
+ ruby test/benchmark.rb
180
+ Warming up --------------------------------------
181
+ sanitize-document-huge
182
+ 1.000 i/100ms
183
+ selma-document-huge 1.000 i/100ms
184
+ Calculating -------------------------------------
185
+ sanitize-document-huge
186
+ 0.257 (± 0.0%) i/s - 2.000 in 7.783398s
187
+ selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
188
+ Warming up --------------------------------------
189
+ sanitize-document-medium
190
+ 2.000 i/100ms
191
+ selma-document-medium
192
+ 22.000 i/100ms
193
+ Calculating -------------------------------------
194
+ sanitize-document-medium
195
+ 28.676 (± 3.5%) i/s - 144.000 in 5.024669s
196
+ selma-document-medium
197
+ 121.500 (±22.2%) i/s - 594.000 in 5.135410s
198
+ Warming up --------------------------------------
199
+ sanitize-document-small
200
+ 10.000 i/100ms
201
+ selma-document-small 20.000 i/100ms
202
+ Calculating -------------------------------------
203
+ sanitize-document-small
204
+ 107.280 (± 0.9%) i/s - 540.000 in 5.033850s
205
+ selma-document-small 118.867 (±31.1%) i/s - 540.000 in 5.080726s
206
+ </pre>
207
+ </details>
161
208
 
162
209
  ## Contributing
163
210
 
data/lib/selma/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Selma
4
- VERSION = "0.0.5"
4
+ VERSION = "0.0.6"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: selma
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  platform: arm64-darwin
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-12-27 00:00:00.000000000 Z
11
+ date: 2022-12-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rb_sys