selma 0.0.5-x86_64-linux → 0.0.6-x86_64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +69 -22
  3. data/lib/selma/version.rb +1 -1
  4. metadata +2 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c44d404c6cd666d24abe3779484bc40f378aafb068cb9abaa96423e37728c2bc
4
- data.tar.gz: df3bab7e8bd1f9289a74d16789f5959bc41ec59e6145e140a429227ce08738d6
3
+ metadata.gz: be9face4692cbc6653b2085a056679aab5eca490517f3fa17e4f53ac5c9b4028
4
+ data.tar.gz: 47fa7091498f304b8aba324637218b7ca0af09e8370084337f1e588be677ac3e
5
5
  SHA512:
6
- metadata.gz: 5131067d67769795f6a156828b6c8e16a963c030cc1f3aae108bd966feda71e6bbd8fbb4c39fbcd04c91c6542bf5c0b227060b9440b2b8b279ed714b5f282fb1
7
- data.tar.gz: 5b2cbd109c3fced2bf922dedee36e1f291206be662ca47bb55bc2dc18c7288f09a3ce481a4eead218cd7369c8217e21ead6c712466d3382aaab898f4362f99c3
6
+ metadata.gz: 9e57f2b3f8a82aa92cbc68c17a465226631af6bc1c89150a65e58a0b87d37698418799f812083de22590fdf11a18e1991d847f6273c7d9d03c66fd4139c10cc7
7
+ data.tar.gz: 66f079b74f387266446c5293b17dd1a534de422c1b7a7149ff562a4a9a05fefd7857216d2e6a782d92eecb7595ec5e1ba7b2e94c2a56350caef63c2d768a3901
data/README.md CHANGED
@@ -24,23 +24,27 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
- Selma can perform two different actions:
27
+ Selma can perform two different actions, either independently or together:
28
28
 
29
29
  - Sanitize HTML, through a [Sanitize](https://github.com/rgrove/sanitize)-like allowlist syntax; and
30
- - Select HTML using CSS rules, and manipulate elements and text
30
+ - Select HTML using CSS rules, and manipulate elements and text nodes along the way.
31
31
 
32
- The basic API for Selma looks like this:
32
+ It does this through two kwargsL `sanitizer` and `handlers`. The basic API for Selma looks like this:
33
33
 
34
34
  ```ruby
35
- rewriter = Selma::Rewriter.new(sanitizer: sanitizer_config, handlers: [MatchAttribute.new, TextRewrite.new])
35
+ sanitizer_config = {
36
+ elements: ["b", "em", "i", "strong", "u"],
37
+ }
38
+ sanitizer = Selma::Sanitizer.new(sanitizer_config)
39
+ rewriter = Selma::Rewriter.new(sanitizer: sanitizer, handlers: [MatchElementRewrite.new, MatchTextRewrite.new])
36
40
  rewriter(html)
37
41
  ```
38
42
 
39
- Let's take a look at each part individually.
43
+ Here's a look at each individual part.
40
44
 
41
45
  ### Sanitization config
42
46
 
43
- Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you want to disable HTML sanitization (for some reason), pass `nil`:
47
+ Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you truly want to disable HTML sanitization (for some reason), pass `nil`:
44
48
 
45
49
  ```ruby
46
50
  Selma::Rewriter.new(sanitizer: nil) # dangerous and ill-advised
@@ -87,22 +91,22 @@ whitespace_elements: ["blockquote", "h1", "h2", "h3", "h4", "h5", "h6", ]
87
91
 
88
92
  ### Defining handlers
89
93
 
90
- The real power in Selma comes in its use of handlers. A handler is simply an object with various methods:
94
+ The real power in Selma comes in its use of handlers. A handler is simply an object with various methods defined:
91
95
 
92
96
  - `selector`, a method which MUST return instance of `Selma::Selector` which defines the CSS classes to match
93
97
  - `handle_element`, a method that's call on each matched element
94
- - `handle_text_chunk`, a method that's called on each matched text node; this MUST return a string
98
+ - `handle_text_chunk`, a method that's called on each matched text node
95
99
 
96
100
  Here's an example which rewrites the `href` attribute on `a` and the `src` attribute on `img` to be `https` rather than `http`.
97
101
 
98
102
  ```ruby
99
103
  class MatchAttribute
100
- SELECTOR = Selma::Selector(match_element: "a, img")
104
+ SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
101
105
 
102
106
  def handle_element(element)
103
- if element.tag_name == "a" && element["href"] =~ /^http:/
107
+ if element.tag_name == "a"
104
108
  element["href"] = rename_http(element["href"])
105
- elsif element.tag_name == "img" && element["src"] =~ /^http:/
109
+ elsif element.tag_name == "img"
106
110
  element["src"] = rename_http(element["src"])
107
111
  end
108
112
  end
@@ -118,10 +122,10 @@ rewriter = Selma::Rewriter.new(handlers: [MatchAttribute.new])
118
122
  The `Selma::Selector` object has three possible kwargs:
119
123
 
120
124
  - `match_element`: any element which matches this CSS rule will be passed on to `handle_element`
121
- - `match_text_within`: any element which matches this CSS rule will be passed on to `handle_text_chunk`
125
+ - `match_text_within`: any text_chunk which matches this CSS rule will be passed on to `handle_text_chunk`
122
126
  - `ignore_text_within`: this is an array of element names whose text contents will be ignored
123
127
 
124
- You've seen an example of `match_element`; here's one for `match_text` which changes strings in various elements which are _not_ `pre` or `code`:
128
+ Here's an example for `handle_text_chunk` which changes strings in various elements which are _not_ `pre` or `code`:
125
129
 
126
130
  ```ruby
127
131
 
@@ -144,20 +148,63 @@ rewriter = Selma::Rewriter.new(handlers: [MatchText.new])
144
148
 
145
149
  The `element` argument in `handle_element` has the following methods:
146
150
 
147
- - `tag_name`: The element's name
148
- - `[]`: get an attribute
149
- - `[]=`: set an attribute
150
- - `remove_attribute`: remove an attribute
151
- - `attributes`: list all the attributes
152
- - `ancestors`: list all the ancestors
153
- - `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
151
+ - `tag_name`: Gets the element's name
152
+ - `tag_name=`: Sets the element's name
153
+ - `self_closing?`: A bool which identifies whether or not the element is self-closing
154
+ - `[]`: Get an attribute
155
+ - `[]=`: Set an attribute
156
+ - `remove_attribute`: Remove an attribute
157
+ - `has_attribute?`: A bool which identifies whether or not the element has an attribute
158
+ - `attributes`: List all the attributes
159
+ - `ancestors`: List all of an element's ancestors as an array of strings
154
160
  - `before(content, as: content_type)`: Inserts `content` before the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
155
161
  - `after(content, as: content_type)`: Inserts `content` after the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
156
- - `set_inner_content`: replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
162
+ - `prepend(content, as: content_type)`: prepends `content` to the element's inner content, i.e. inserts content right after the element's start tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
163
+ - `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
164
+ - `set_inner_content`: Replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
165
+
166
+ #### `text_chunk` methods
167
+
168
+ - `to_s` / `.content`: Gets the text node's content
169
+ - `text_type`: identifies the type of text in the text node
170
+ - `before(content, as: content_type)`: Inserts `content` before the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
171
+ - `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
172
+ - `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
157
173
 
158
174
  ## Benchmarks
159
175
 
160
- TBD
176
+ <details>
177
+ <pre>
178
+ ruby test/benchmark.rb
179
+ ruby test/benchmark.rb
180
+ Warming up --------------------------------------
181
+ sanitize-document-huge
182
+ 1.000 i/100ms
183
+ selma-document-huge 1.000 i/100ms
184
+ Calculating -------------------------------------
185
+ sanitize-document-huge
186
+ 0.257 (± 0.0%) i/s - 2.000 in 7.783398s
187
+ selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
188
+ Warming up --------------------------------------
189
+ sanitize-document-medium
190
+ 2.000 i/100ms
191
+ selma-document-medium
192
+ 22.000 i/100ms
193
+ Calculating -------------------------------------
194
+ sanitize-document-medium
195
+ 28.676 (± 3.5%) i/s - 144.000 in 5.024669s
196
+ selma-document-medium
197
+ 121.500 (±22.2%) i/s - 594.000 in 5.135410s
198
+ Warming up --------------------------------------
199
+ sanitize-document-small
200
+ 10.000 i/100ms
201
+ selma-document-small 20.000 i/100ms
202
+ Calculating -------------------------------------
203
+ sanitize-document-small
204
+ 107.280 (± 0.9%) i/s - 540.000 in 5.033850s
205
+ selma-document-small 118.867 (±31.1%) i/s - 540.000 in 5.080726s
206
+ </pre>
207
+ </details>
161
208
 
162
209
  ## Contributing
163
210
 
data/lib/selma/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Selma
4
- VERSION = "0.0.5"
4
+ VERSION = "0.0.6"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: selma
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  platform: x86_64-linux
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-12-27 00:00:00.000000000 Z
11
+ date: 2022-12-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rb_sys