selma 0.0.5-x86_64-darwin → 0.0.6-x86_64-darwin

Sign up to get free protection for your applications and to get access to all the features.
Files changed (4) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +69 -22
  3. data/lib/selma/version.rb +1 -1
  4. metadata +2 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2505abb2e18c7f001866cfa03b52a27854b264b173f763e068f62d0e359bdd0f
4
- data.tar.gz: 53a68d13f36649a832ae93bf77d9ababa3b44e46a9cd264cb065d2e60d1a12c6
3
+ metadata.gz: ff06e894a94d71484aeb8cf8c8acab9ef7452e3a3f1b3297bdf7919cb6b63dad
4
+ data.tar.gz: 7197ff50dabf6dfa947a67969eed4c82cde327e5b9315cd12b5da792602ccdb6
5
5
  SHA512:
6
- metadata.gz: dc42caa9767caa33f8754d38017f3997f5e820c2788dd8b2192d429a86ad0b41da067266fa064497d61583f8d7cc96b41f9da5869f8eaf102407a226680c025d
7
- data.tar.gz: 6656140ed6d6c9d10f88727f247909e3db6fd1b2772c3eae471a2614ef210d7b2d09e4da46b8dc7db6adbd50b7e73a1040e2cebe618bf406e549aa804748b08b
6
+ metadata.gz: 2bee3472093aee2400b434df359258a844c151799965f713b0b12d7af52f14124d7e1c9b05ab1d730c972167f207af8fc154eb81c5c54459ae1e6debd45b07cf
7
+ data.tar.gz: c7cc0d4fdebf4da12a4ee603478bc972ffd5abebd6cdf1a9778df140b7e5e282c877f06e4404da0a0c1d3945f20718cc09688a6096b5324608953e2cae8571f4
data/README.md CHANGED
@@ -24,23 +24,27 @@ Or install it yourself as:
24
24
 
25
25
  ## Usage
26
26
 
27
- Selma can perform two different actions:
27
+ Selma can perform two different actions, either independently or together:
28
28
 
29
29
  - Sanitize HTML, through a [Sanitize](https://github.com/rgrove/sanitize)-like allowlist syntax; and
30
- - Select HTML using CSS rules, and manipulate elements and text
30
+ - Select HTML using CSS rules, and manipulate elements and text nodes along the way.
31
31
 
32
- The basic API for Selma looks like this:
32
+ It does this through two kwargsL `sanitizer` and `handlers`. The basic API for Selma looks like this:
33
33
 
34
34
  ```ruby
35
- rewriter = Selma::Rewriter.new(sanitizer: sanitizer_config, handlers: [MatchAttribute.new, TextRewrite.new])
35
+ sanitizer_config = {
36
+ elements: ["b", "em", "i", "strong", "u"],
37
+ }
38
+ sanitizer = Selma::Sanitizer.new(sanitizer_config)
39
+ rewriter = Selma::Rewriter.new(sanitizer: sanitizer, handlers: [MatchElementRewrite.new, MatchTextRewrite.new])
36
40
  rewriter(html)
37
41
  ```
38
42
 
39
- Let's take a look at each part individually.
43
+ Here's a look at each individual part.
40
44
 
41
45
  ### Sanitization config
42
46
 
43
- Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you want to disable HTML sanitization (for some reason), pass `nil`:
47
+ Selma sanitizes by default. That is, even if the `sanitizer` kwarg is not passed in, sanitization occurs. If you truly want to disable HTML sanitization (for some reason), pass `nil`:
44
48
 
45
49
  ```ruby
46
50
  Selma::Rewriter.new(sanitizer: nil) # dangerous and ill-advised
@@ -87,22 +91,22 @@ whitespace_elements: ["blockquote", "h1", "h2", "h3", "h4", "h5", "h6", ]
87
91
 
88
92
  ### Defining handlers
89
93
 
90
- The real power in Selma comes in its use of handlers. A handler is simply an object with various methods:
94
+ The real power in Selma comes in its use of handlers. A handler is simply an object with various methods defined:
91
95
 
92
96
  - `selector`, a method which MUST return instance of `Selma::Selector` which defines the CSS classes to match
93
97
  - `handle_element`, a method that's call on each matched element
94
- - `handle_text_chunk`, a method that's called on each matched text node; this MUST return a string
98
+ - `handle_text_chunk`, a method that's called on each matched text node
95
99
 
96
100
  Here's an example which rewrites the `href` attribute on `a` and the `src` attribute on `img` to be `https` rather than `http`.
97
101
 
98
102
  ```ruby
99
103
  class MatchAttribute
100
- SELECTOR = Selma::Selector(match_element: "a, img")
104
+ SELECTOR = Selma::Selector(match_element: %(a[href^="http:"], img[src^="http:"]"))
101
105
 
102
106
  def handle_element(element)
103
- if element.tag_name == "a" && element["href"] =~ /^http:/
107
+ if element.tag_name == "a"
104
108
  element["href"] = rename_http(element["href"])
105
- elsif element.tag_name == "img" && element["src"] =~ /^http:/
109
+ elsif element.tag_name == "img"
106
110
  element["src"] = rename_http(element["src"])
107
111
  end
108
112
  end
@@ -118,10 +122,10 @@ rewriter = Selma::Rewriter.new(handlers: [MatchAttribute.new])
118
122
  The `Selma::Selector` object has three possible kwargs:
119
123
 
120
124
  - `match_element`: any element which matches this CSS rule will be passed on to `handle_element`
121
- - `match_text_within`: any element which matches this CSS rule will be passed on to `handle_text_chunk`
125
+ - `match_text_within`: any text_chunk which matches this CSS rule will be passed on to `handle_text_chunk`
122
126
  - `ignore_text_within`: this is an array of element names whose text contents will be ignored
123
127
 
124
- You've seen an example of `match_element`; here's one for `match_text` which changes strings in various elements which are _not_ `pre` or `code`:
128
+ Here's an example for `handle_text_chunk` which changes strings in various elements which are _not_ `pre` or `code`:
125
129
 
126
130
  ```ruby
127
131
 
@@ -144,20 +148,63 @@ rewriter = Selma::Rewriter.new(handlers: [MatchText.new])
144
148
 
145
149
  The `element` argument in `handle_element` has the following methods:
146
150
 
147
- - `tag_name`: The element's name
148
- - `[]`: get an attribute
149
- - `[]=`: set an attribute
150
- - `remove_attribute`: remove an attribute
151
- - `attributes`: list all the attributes
152
- - `ancestors`: list all the ancestors
153
- - `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
151
+ - `tag_name`: Gets the element's name
152
+ - `tag_name=`: Sets the element's name
153
+ - `self_closing?`: A bool which identifies whether or not the element is self-closing
154
+ - `[]`: Get an attribute
155
+ - `[]=`: Set an attribute
156
+ - `remove_attribute`: Remove an attribute
157
+ - `has_attribute?`: A bool which identifies whether or not the element has an attribute
158
+ - `attributes`: List all the attributes
159
+ - `ancestors`: List all of an element's ancestors as an array of strings
154
160
  - `before(content, as: content_type)`: Inserts `content` before the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
155
161
  - `after(content, as: content_type)`: Inserts `content` after the element. `content_type` is either `:text` or `:html` and determines how the content will be applied.
156
- - `set_inner_content`: replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
162
+ - `prepend(content, as: content_type)`: prepends `content` to the element's inner content, i.e. inserts content right after the element's start tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
163
+ - `append(content, as: content_type)`: appends `content` to the element's inner content, i.e. inserts content right before the element's end tag. `content_type` is either `:text` or `:html` and determines how the content will be applied.
164
+ - `set_inner_content`: Replaces inner content of the element with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
165
+
166
+ #### `text_chunk` methods
167
+
168
+ - `to_s` / `.content`: Gets the text node's content
169
+ - `text_type`: identifies the type of text in the text node
170
+ - `before(content, as: content_type)`: Inserts `content` before the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
171
+ - `after(content, as: content_type)`: Inserts `content` after the text. `content_type` is either `:text` or `:html` and determines how the content will be applied.
172
+ - `replace(content, as: content_type)`: Replaces the text node with `content`. `content_type` is either `:text` or `:html` and determines how the content will be applied.
157
173
 
158
174
  ## Benchmarks
159
175
 
160
- TBD
176
+ <details>
177
+ <pre>
178
+ ruby test/benchmark.rb
179
+ ruby test/benchmark.rb
180
+ Warming up --------------------------------------
181
+ sanitize-document-huge
182
+ 1.000 i/100ms
183
+ selma-document-huge 1.000 i/100ms
184
+ Calculating -------------------------------------
185
+ sanitize-document-huge
186
+ 0.257 (± 0.0%) i/s - 2.000 in 7.783398s
187
+ selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
188
+ Warming up --------------------------------------
189
+ sanitize-document-medium
190
+ 2.000 i/100ms
191
+ selma-document-medium
192
+ 22.000 i/100ms
193
+ Calculating -------------------------------------
194
+ sanitize-document-medium
195
+ 28.676 (± 3.5%) i/s - 144.000 in 5.024669s
196
+ selma-document-medium
197
+ 121.500 (±22.2%) i/s - 594.000 in 5.135410s
198
+ Warming up --------------------------------------
199
+ sanitize-document-small
200
+ 10.000 i/100ms
201
+ selma-document-small 20.000 i/100ms
202
+ Calculating -------------------------------------
203
+ sanitize-document-small
204
+ 107.280 (± 0.9%) i/s - 540.000 in 5.033850s
205
+ selma-document-small 118.867 (±31.1%) i/s - 540.000 in 5.080726s
206
+ </pre>
207
+ </details>
161
208
 
162
209
  ## Contributing
163
210
 
data/lib/selma/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Selma
4
- VERSION = "0.0.5"
4
+ VERSION = "0.0.6"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: selma
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  platform: x86_64-darwin
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-12-27 00:00:00.000000000 Z
11
+ date: 2022-12-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rb_sys