selma 0.0.5-x86_64-linux → 0.0.6-x86_64-linux
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +69 -22
- data/lib/selma/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: be9face4692cbc6653b2085a056679aab5eca490517f3fa17e4f53ac5c9b4028
|
4
|
+
data.tar.gz: 47fa7091498f304b8aba324637218b7ca0af09e8370084337f1e588be677ac3e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9e57f2b3f8a82aa92cbc68c17a465226631af6bc1c89150a65e58a0b87d37698418799f812083de22590fdf11a18e1991d847f6273c7d9d03c66fd4139c10cc7
|
7
|
+
data.tar.gz: 66f079b74f387266446c5293b17dd1a534de422c1b7a7149ff562a4a9a05fefd7857216d2e6a782d92eecb7595ec5e1ba7b2e94c2a56350caef63c2d768a3901
|
data/README.md
CHANGED
@@ -24,23 +24,27 @@ Or install it yourself as:
|
|
24
24
|
|
25
25
|
## Usage
|
26
26
|
|
27
|
-
Selma can perform two different actions:
|
27
|
+
Selma can perform two different actions, either independently or together:
|
28
28
|
|
29
29
|
- Sanitize HTML, through a [Sanitize](https://github.com/rgrove/sanitize)-like allowlist syntax; and
|
30
|
-
- Select HTML using CSS rules, and manipulate elements and text
|
30
|
+
- Select HTML using CSS rules, and manipulate elements and text nodes along the way.
|
31
31
|
|
32
|
-
The basic API for Selma looks like this:
|
32
|
+
It does this through two kwargsL `sanitizer` and `handlers`. The basic API for Selma looks like this:
|
33
33
|
|
34
34
|
```ruby
|
35
|
-
|
35
|
+
sanitizer_config = {
|
36
|
+
elements: ["b", "em", "i", "strong", "u"],
|
37
|
+
}
|
38
|
+
sanitizer = Selma::Sanitizer.new(sanitizer_config)
|
39
|
+
rewriter = Selma::Rewriter.new(sanitizer: sanitizer, handlers: [MatchElementRewrite.new, MatchTextRewrite.new])
|
36
40
|
rewriter(html)
|
37
41
|
```
|
38
42
|
|
39
|
-
|
43
|
+
Here's a look at each individual part.
|
40
44
|
|
41
45
|
### Sanitization config
|
42
46
|
|
43
|
-
Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you want to disable HTML sanitization (for some reason), pass `nil`:
|
47
|
+
Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you truly want to disable HTML sanitization (for some reason), pass `nil`:
|
44
48
|
|
45
49
|
```ruby
|
46
50
|
Selma::Rewriter.new(sanitizer: nil) # dangerous and ill-advised
|
@@ -87,22 +91,22 @@ whitespace_elements: ["blockquote", "h1", "h2", "h3", "h4", "h5", "h6", ]
|
|
87
91
|
|
88
92
|
### Defining handlers
|
89
93
|
|
90
|
-
The real power in Selma comes in its use of handlers. A handler is simply an object with various methods:
|
94
|
+
The real power in Selma comes in its use of handlers. A handler is simply an object with various methods defined:
|
91
95
|
|
92
96
|
- `selector`, a method which MUST return instance of `Selma::Selector` which defines the CSS classes to match
|
93
97
|
- `handle_element`, a method that's call on each matched element
|
94
|
-
- `handle_text_chunk`, a method that's called on each matched text node
|
98
|
+
- `handle_text_chunk`, a method that's called on each matched text node
|
95
99
|
|
96
100
|
Here's an example which rewrites the `href` attribute on `a` and the `src` attribute on `img` to be `https` rather than `http`.
|
97
101
|
|
98
102
|
```ruby
|
99
103
|
class MatchAttribute
|
100
|
-
SELECTOR = Selma::Selector(match_element: "
|
104
|
+
SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
|
101
105
|
|
102
106
|
def handle_element(element)
|
103
|
-
if element.tag_name == "a"
|
107
|
+
if element.tag_name == "a"
|
104
108
|
element["href"] = rename_http(element["href"])
|
105
|
-
elsif element.tag_name == "img"
|
109
|
+
elsif element.tag_name == "img"
|
106
110
|
element["src"] = rename_http(element["src"])
|
107
111
|
end
|
108
112
|
end
|
@@ -118,10 +122,10 @@ rewriter = Selma::Rewriter.new(handlers: [MatchAttribute.new])
|
|
118
122
|
The `Selma::Selector` object has three possible kwargs:
|
119
123
|
|
120
124
|
- `match_element`: any element which matches this CSS rule will be passed on to `handle_element`
|
121
|
-
- `match_text_within`: any
|
125
|
+
- `match_text_within`: any text_chunk which matches this CSS rule will be passed on to `handle_text_chunk`
|
122
126
|
- `ignore_text_within`: this is an array of element names whose text contents will be ignored
|
123
127
|
|
124
|
-
|
128
|
+
Here's an example for `handle_text_chunk` which changes strings in various elements which are _not_ `pre` or `code`:
|
125
129
|
|
126
130
|
```ruby
|
127
131
|
|
@@ -144,20 +148,63 @@ rewriter = Selma::Rewriter.new(handlers: [MatchText.new])
|
|
144
148
|
|
145
149
|
The `element` argument in `handle_element` has the following methods:
|
146
150
|
|
147
|
-
- `tag_name`:
|
148
|
-
- `
|
149
|
-
- `
|
150
|
-
- `
|
151
|
-
- `
|
152
|
-
- `
|
153
|
-
- `
|
151
|
+
- `tag_name`: Gets the element's name
|
152
|
+
- `tag_name=`: Sets the element's name
|
153
|
+
- `self_closing?`: A bool which identifies whether or not the element is self-closing
|
154
|
+
- `[]`: Get an attribute
|
155
|
+
- `[]=`: Set an attribute
|
156
|
+
- `remove_attribute`: Remove an attribute
|
157
|
+
- `has_attribute?`: A bool which identifies whether or not the element has an attribute
|
158
|
+
- `attributes`: List all the attributes
|
159
|
+
- `ancestors`: List all of an element's ancestors as an array of strings
|
154
160
|
- `before(content, as: content_type)`: Inserts `content` before the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
155
161
|
- `after(content, as: content_type)`: Inserts `content` after the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
156
|
-
- `
|
162
|
+
- `prepend(content, as: content_type)`: prepends `content` to the element's inner content, i.e. inserts content right after the element's start tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
163
|
+
- `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
164
|
+
- `set_inner_content`: Replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
165
|
+
|
166
|
+
#### `text_chunk` methods
|
167
|
+
|
168
|
+
- `to_s` / `.content`: Gets the text node's content
|
169
|
+
- `text_type`: identifies the type of text in the text node
|
170
|
+
- `before(content, as: content_type)`: Inserts `content` before the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
171
|
+
- `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
172
|
+
- `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
157
173
|
|
158
174
|
## Benchmarks
|
159
175
|
|
160
|
-
|
176
|
+
<details>
|
177
|
+
<pre>
|
178
|
+
ruby test/benchmark.rb
|
179
|
+
ruby test/benchmark.rb
|
180
|
+
Warming up --------------------------------------
|
181
|
+
sanitize-document-huge
|
182
|
+
1.000 i/100ms
|
183
|
+
selma-document-huge 1.000 i/100ms
|
184
|
+
Calculating -------------------------------------
|
185
|
+
sanitize-document-huge
|
186
|
+
0.257 (± 0.0%) i/s - 2.000 in 7.783398s
|
187
|
+
selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
|
188
|
+
Warming up --------------------------------------
|
189
|
+
sanitize-document-medium
|
190
|
+
2.000 i/100ms
|
191
|
+
selma-document-medium
|
192
|
+
22.000 i/100ms
|
193
|
+
Calculating -------------------------------------
|
194
|
+
sanitize-document-medium
|
195
|
+
28.676 (± 3.5%) i/s - 144.000 in 5.024669s
|
196
|
+
selma-document-medium
|
197
|
+
121.500 (±22.2%) i/s - 594.000 in 5.135410s
|
198
|
+
Warming up --------------------------------------
|
199
|
+
sanitize-document-small
|
200
|
+
10.000 i/100ms
|
201
|
+
selma-document-small 20.000 i/100ms
|
202
|
+
Calculating -------------------------------------
|
203
|
+
sanitize-document-small
|
204
|
+
107.280 (± 0.9%) i/s - 540.000 in 5.033850s
|
205
|
+
selma-document-small 118.867 (±31.1%) i/s - 540.000 in 5.080726s
|
206
|
+
</pre>
|
207
|
+
</details>
|
161
208
|
|
162
209
|
## Contributing
|
163
210
|
|
data/lib/selma/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: selma
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.6
|
5
5
|
platform: x86_64-linux
|
6
6
|
authors:
|
7
7
|
- Garen J. Torikian
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-12-
|
11
|
+
date: 2022-12-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rb_sys
|