selma 0.0.5-x86_64-linux → 0.0.6-x86_64-linux
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +69 -22
- data/lib/selma/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: be9face4692cbc6653b2085a056679aab5eca490517f3fa17e4f53ac5c9b4028
|
4
|
+
data.tar.gz: 47fa7091498f304b8aba324637218b7ca0af09e8370084337f1e588be677ac3e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9e57f2b3f8a82aa92cbc68c17a465226631af6bc1c89150a65e58a0b87d37698418799f812083de22590fdf11a18e1991d847f6273c7d9d03c66fd4139c10cc7
|
7
|
+
data.tar.gz: 66f079b74f387266446c5293b17dd1a534de422c1b7a7149ff562a4a9a05fefd7857216d2e6a782d92eecb7595ec5e1ba7b2e94c2a56350caef63c2d768a3901
|
data/README.md
CHANGED
@@ -24,23 +24,27 @@ Or install it yourself as:
|
|
24
24
|
|
25
25
|
## Usage
|
26
26
|
|
27
|
-
Selma can perform two different actions:
|
27
|
+
Selma can perform two different actions, either independently or together:
|
28
28
|
|
29
29
|
- Sanitize HTML, through a [Sanitize](https://github.com/rgrove/sanitize)-like allowlist syntax; and
|
30
|
-
- Select HTML using CSS rules, and manipulate elements and text
|
30
|
+
- Select HTML using CSS rules, and manipulate elements and text nodes along the way.
|
31
31
|
|
32
|
-
The basic API for Selma looks like this:
|
32
|
+
It does this through two kwargsL `sanitizer` and `handlers`. The basic API for Selma looks like this:
|
33
33
|
|
34
34
|
```ruby
|
35
|
-
|
35
|
+
sanitizer_config = {
|
36
|
+
elements: ["b", "em", "i", "strong", "u"],
|
37
|
+
}
|
38
|
+
sanitizer = Selma::Sanitizer.new(sanitizer_config)
|
39
|
+
rewriter = Selma::Rewriter.new(sanitizer: sanitizer, handlers: [MatchElementRewrite.new, MatchTextRewrite.new])
|
36
40
|
rewriter(html)
|
37
41
|
```
|
38
42
|
|
39
|
-
|
43
|
+
Here's a look at each individual part.
|
40
44
|
|
41
45
|
### Sanitization config
|
42
46
|
|
43
|
-
Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you want to disable HTML sanitization (for some reason), pass `nil`:
|
47
|
+
Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you truly want to disable HTML sanitization (for some reason), pass `nil`:
|
44
48
|
|
45
49
|
```ruby
|
46
50
|
Selma::Rewriter.new(sanitizer: nil) # dangerous and ill-advised
|
@@ -87,22 +91,22 @@ whitespace_elements: ["blockquote", "h1", "h2", "h3", "h4", "h5", "h6", ]
|
|
87
91
|
|
88
92
|
### Defining handlers
|
89
93
|
|
90
|
-
The real power in Selma comes in its use of handlers. A handler is simply an object with various methods:
|
94
|
+
The real power in Selma comes in its use of handlers. A handler is simply an object with various methods defined:
|
91
95
|
|
92
96
|
- `selector`, a method which MUST return instance of `Selma::Selector` which defines the CSS classes to match
|
93
97
|
- `handle_element`, a method that's call on each matched element
|
94
|
-
- `handle_text_chunk`, a method that's called on each matched text node
|
98
|
+
- `handle_text_chunk`, a method that's called on each matched text node
|
95
99
|
|
96
100
|
Here's an example which rewrites the `href` attribute on `a` and the `src` attribute on `img` to be `https` rather than `http`.
|
97
101
|
|
98
102
|
```ruby
|
99
103
|
class MatchAttribute
|
100
|
-
SELECTOR = Selma::Selector(match_element: "
|
104
|
+
SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
|
101
105
|
|
102
106
|
def handle_element(element)
|
103
|
-
if element.tag_name == "a"
|
107
|
+
if element.tag_name == "a"
|
104
108
|
element["href"] = rename_http(element["href"])
|
105
|
-
elsif element.tag_name == "img"
|
109
|
+
elsif element.tag_name == "img"
|
106
110
|
element["src"] = rename_http(element["src"])
|
107
111
|
end
|
108
112
|
end
|
@@ -118,10 +122,10 @@ rewriter = Selma::Rewriter.new(handlers: [MatchAttribute.new])
|
|
118
122
|
The `Selma::Selector` object has three possible kwargs:
|
119
123
|
|
120
124
|
- `match_element`: any element which matches this CSS rule will be passed on to `handle_element`
|
121
|
-
- `match_text_within`: any
|
125
|
+
- `match_text_within`: any text_chunk which matches this CSS rule will be passed on to `handle_text_chunk`
|
122
126
|
- `ignore_text_within`: this is an array of element names whose text contents will be ignored
|
123
127
|
|
124
|
-
|
128
|
+
Here's an example for `handle_text_chunk` which changes strings in various elements which are _not_ `pre` or `code`:
|
125
129
|
|
126
130
|
```ruby
|
127
131
|
|
@@ -144,20 +148,63 @@ rewriter = Selma::Rewriter.new(handlers: [MatchText.new])
|
|
144
148
|
|
145
149
|
The `element` argument in `handle_element` has the following methods:
|
146
150
|
|
147
|
-
- `tag_name`:
|
148
|
-
- `
|
149
|
-
- `
|
150
|
-
- `
|
151
|
-
- `
|
152
|
-
- `
|
153
|
-
- `
|
151
|
+
- `tag_name`: Gets the element's name
|
152
|
+
- `tag_name=`: Sets the element's name
|
153
|
+
- `self_closing?`: A bool which identifies whether or not the element is self-closing
|
154
|
+
- `[]`: Get an attribute
|
155
|
+
- `[]=`: Set an attribute
|
156
|
+
- `remove_attribute`: Remove an attribute
|
157
|
+
- `has_attribute?`: A bool which identifies whether or not the element has an attribute
|
158
|
+
- `attributes`: List all the attributes
|
159
|
+
- `ancestors`: List all of an element's ancestors as an array of strings
|
154
160
|
- `before(content, as: content_type)`: Inserts `content` before the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
155
161
|
- `after(content, as: content_type)`: Inserts `content` after the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
156
|
-
- `
|
162
|
+
- `prepend(content, as: content_type)`: prepends `content` to the element's inner content, i.e. inserts content right after the element's start tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
163
|
+
- `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
164
|
+
- `set_inner_content`: Replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
165
|
+
|
166
|
+
#### `text_chunk` methods
|
167
|
+
|
168
|
+
- `to_s` / `.content`: Gets the text node's content
|
169
|
+
- `text_type`: identifies the type of text in the text node
|
170
|
+
- `before(content, as: content_type)`: Inserts `content` before the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
171
|
+
- `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
172
|
+
- `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
|
157
173
|
|
158
174
|
## Benchmarks
|
159
175
|
|
160
|
-
|
176
|
+
<details>
|
177
|
+
<pre>
|
178
|
+
ruby test/benchmark.rb
|
179
|
+
ruby test/benchmark.rb
|
180
|
+
Warming up --------------------------------------
|
181
|
+
sanitize-document-huge
|
182
|
+
1.000 i/100ms
|
183
|
+
selma-document-huge 1.000 i/100ms
|
184
|
+
Calculating -------------------------------------
|
185
|
+
sanitize-document-huge
|
186
|
+
0.257 (± 0.0%) i/s - 2.000 in 7.783398s
|
187
|
+
selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
|
188
|
+
Warming up --------------------------------------
|
189
|
+
sanitize-document-medium
|
190
|
+
2.000 i/100ms
|
191
|
+
selma-document-medium
|
192
|
+
22.000 i/100ms
|
193
|
+
Calculating -------------------------------------
|
194
|
+
sanitize-document-medium
|
195
|
+
28.676 (± 3.5%) i/s - 144.000 in 5.024669s
|
196
|
+
selma-document-medium
|
197
|
+
121.500 (±22.2%) i/s - 594.000 in 5.135410s
|
198
|
+
Warming up --------------------------------------
|
199
|
+
sanitize-document-small
|
200
|
+
10.000 i/100ms
|
201
|
+
selma-document-small 20.000 i/100ms
|
202
|
+
Calculating -------------------------------------
|
203
|
+
sanitize-document-small
|
204
|
+
107.280 (± 0.9%) i/s - 540.000 in 5.033850s
|
205
|
+
selma-document-small 118.867 (±31.1%) i/s - 540.000 in 5.080726s
|
206
|
+
</pre>
|
207
|
+
</details>
|
161
208
|
|
162
209
|
## Contributing
|
163
210
|
|
data/lib/selma/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: selma
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.6
|
5
5
|
platform: x86_64-linux
|
6
6
|
authors:
|
7
7
|
- Garen J. Torikian
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2022-12-
|
11
|
+
date: 2022-12-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rb_sys
|