dandruff 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rubocop.yml +23 -0
- data/CHANGELOG.md +69 -0
- data/COMPARISON.md +175 -0
- data/Gemfile +5 -0
- data/Gemfile.lock +142 -0
- data/LICENSE.txt +21 -0
- data/Makefile +41 -0
- data/README.md +1196 -0
- data/Rakefile +12 -0
- data/examples/basic_usage.rb +84 -0
- data/examples/email_sanitization_example.md +268 -0
- data/failed-expectations.md +192 -0
- data/lib/dandruff/attributes.rb +223 -0
- data/lib/dandruff/config.rb +500 -0
- data/lib/dandruff/expressions.rb +103 -0
- data/lib/dandruff/tags.rb +160 -0
- data/lib/dandruff/utils.rb +27 -0
- data/lib/dandruff/version.rb +5 -0
- data/lib/dandruff.rb +1095 -0
- metadata +134 -0
data/README.md
ADDED
|
@@ -0,0 +1,1196 @@
|
|
|
1
|
+
# Dandruff
|
|
2
|
+
|
|
3
|
+
**Because your markup deserves a good shampoo.**
|
|
4
|
+
|
|
5
|
+
If you're scratching your head because of XSS and your HTML is flaking with `<script>alert('gotcha')</script>`, it's time to wash that mess out!
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
```ruby
|
|
9
|
+
clean_html = Dandruff.scrub(dirty_html)
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
## Introduction
|
|
13
|
+
|
|
14
|
+
Dandruff is a Ruby HTML sanitizer providing comprehensive XSS protection with an idiomatic, developer-friendly API. It's built on the battle-tested security foundations of [DOMPurify](https://github.com/cure53/DOMPurify), bringing proven XSS defense to the Ruby ecosystem. Whether you're sanitizing user comments, rendering rich content, or processing HTML emails, Dandruff can help you keep your markup clean and secure.
|
|
15
|
+
|
|
16
|
+
### Key Features
|
|
17
|
+
|
|
18
|
+
- **Comprehensive XSS Protection** - Defends against XSS, mXSS, DOM clobbering, and protocol injection
|
|
19
|
+
- **Flexible Configuration** - Fine-grained control over tags, attributes, and sanitization behavior
|
|
20
|
+
- **Content Type Profiles** - Pre-configured settings for HTML, SVG, MathML, and HTML email
|
|
21
|
+
- **Hook System** - Extend sanitization with custom processing logic
|
|
22
|
+
- **Developer-Friendly API** - Intuitive Ruby idioms with block-based configuration
|
|
23
|
+
- **Performance Optimized** - Efficient multi-pass sanitization with configurable limits
|
|
24
|
+
- **Battle-Tested** - Based on DOMPurify's proven security model
|
|
25
|
+
|
|
26
|
+
## 🚀 Quickstart
|
|
27
|
+
|
|
28
|
+
Install the gem:
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
gem install dandruff
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
or with Bundler:
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
# in your Gemfile
|
|
38
|
+
gem 'dandruff'
|
|
39
|
+
|
|
40
|
+
# then run
|
|
41
|
+
bundle install
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Then in your controller or wherever you need to sanitize HTML:
|
|
45
|
+
|
|
46
|
+
```ruby
|
|
47
|
+
safe_comment = Dandruff.scrub(params[:comment], allowed_tags: ['p', 'strong', 'em'])
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## ⚙️ Configuration
|
|
51
|
+
|
|
52
|
+
Dandruff offers three ways to configure sanitization: block-based, direct configuration, and per-call options.
|
|
53
|
+
|
|
54
|
+
### Configuration Styles
|
|
55
|
+
|
|
56
|
+
```ruby
|
|
57
|
+
# 1. Block-based configuration (recommended for instances)
|
|
58
|
+
dandruff = Dandruff.new do |config|
|
|
59
|
+
config.allowed_tags = ['p', 'strong', 'em']
|
|
60
|
+
config.allowed_attributes = ['class', 'href']
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
# 2. Direct configuration
|
|
64
|
+
dandruff = Dandruff.new
|
|
65
|
+
dandruff.set_config(
|
|
66
|
+
allowed_tags: ['p', 'strong'],
|
|
67
|
+
allowed_attributes: ['class']
|
|
68
|
+
)
|
|
69
|
+
|
|
70
|
+
# 3. Per-call configuration
|
|
71
|
+
dandruff = Dandruff.new
|
|
72
|
+
dandruff.scrub(html, allowed_tags: ['p'], allowed_attributes: ['class'])
|
|
73
|
+
|
|
74
|
+
# 4. Class method with configuration
|
|
75
|
+
clean = Dandruff.scrub(html, allowed_tags: ['p', 'strong'])
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Common Configuration Patterns
|
|
79
|
+
|
|
80
|
+
#### Restrict to specific tags and attributes
|
|
81
|
+
|
|
82
|
+
```ruby
|
|
83
|
+
dandruff = Dandruff.new do |config|
|
|
84
|
+
config.allowed_tags = ['p', 'strong', 'em', 'a']
|
|
85
|
+
config.allowed_attributes = ['href', 'title']
|
|
86
|
+
end
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
#### Extend defaults instead of replacing
|
|
90
|
+
|
|
91
|
+
```ruby
|
|
92
|
+
dandruff = Dandruff.new do |config|
|
|
93
|
+
config.additional_tags = ['custom-element']
|
|
94
|
+
config.additional_attributes = ['data-custom-id']
|
|
95
|
+
end
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
#### Block specific tags or attributes
|
|
99
|
+
|
|
100
|
+
```ruby
|
|
101
|
+
dandruff = Dandruff.new do |config|
|
|
102
|
+
config.forbidden_tags = ['script', 'iframe']
|
|
103
|
+
config.forbidden_attributes = ['onclick', 'onerror']
|
|
104
|
+
end
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
→ See [Configuration Reference](#configuration-reference) for all available options.
|
|
108
|
+
|
|
109
|
+
## 📖 Usage
|
|
110
|
+
|
|
111
|
+
### Simple Use Cases
|
|
112
|
+
|
|
113
|
+
#### Sanitize User Comments
|
|
114
|
+
|
|
115
|
+
```ruby
|
|
116
|
+
# Basic text formatting only
|
|
117
|
+
dandruff = Dandruff.new do |config|
|
|
118
|
+
config.allowed_tags = ['p', 'br', 'strong', 'em', 'a']
|
|
119
|
+
config.allowed_attributes = ['href']
|
|
120
|
+
config.forbidden_attributes = ['onclick', 'onerror']
|
|
121
|
+
end
|
|
122
|
+
|
|
123
|
+
comment = params[:comment]
|
|
124
|
+
safe_comment = dandruff.scrub(comment)
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
#### Sanitize Markdown-Generated HTML
|
|
128
|
+
|
|
129
|
+
```ruby
|
|
130
|
+
# Allow rich formatting from Markdown
|
|
131
|
+
dandruff = Dandruff.new do |config|
|
|
132
|
+
config.allowed_tags = [
|
|
133
|
+
'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
|
|
134
|
+
'p', 'br', 'strong', 'em', 'code', 'pre',
|
|
135
|
+
'ul', 'ol', 'li', 'blockquote', 'a'
|
|
136
|
+
]
|
|
137
|
+
config.allowed_attributes = ['href', 'title']
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
html = markdown_renderer.render(params[:content])
|
|
141
|
+
safe_html = dandruff.scrub(html)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
#### Sanitize Blog Post Content
|
|
145
|
+
|
|
146
|
+
```ruby
|
|
147
|
+
# Rich content with images
|
|
148
|
+
dandruff = Dandruff.new do |config|
|
|
149
|
+
config.allowed_tags = [
|
|
150
|
+
'p', 'br', 'strong', 'em', 'ul', 'ol', 'li',
|
|
151
|
+
'h2', 'h3', 'blockquote', 'code', 'pre',
|
|
152
|
+
'a', 'img'
|
|
153
|
+
]
|
|
154
|
+
config.allowed_attributes = ['href', 'title', 'src', 'alt', 'class']
|
|
155
|
+
config.allow_data_uri = false # Block data URIs for images
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
post_html = params[:post][:content]
|
|
159
|
+
safe_html = dandruff.scrub(post_html)
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Intermediate Use Cases
|
|
163
|
+
|
|
164
|
+
#### Using Profiles for Content Types
|
|
165
|
+
|
|
166
|
+
Profiles are pre-configured sets of tags and attributes for common content types:
|
|
167
|
+
|
|
168
|
+
```ruby
|
|
169
|
+
# HTML content profile
|
|
170
|
+
dandruff = Dandruff.new do |config|
|
|
171
|
+
config.use_profiles = { html: true }
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
# SVG graphics
|
|
175
|
+
dandruff = Dandruff.new do |config|
|
|
176
|
+
config.use_profiles = { html: true, svg: true }
|
|
177
|
+
end
|
|
178
|
+
|
|
179
|
+
# Mathematical content
|
|
180
|
+
dandruff = Dandruff.new do |config|
|
|
181
|
+
config.use_profiles = { html: true, math_ml: true }
|
|
182
|
+
end
|
|
183
|
+
|
|
184
|
+
# Combine multiple profiles
|
|
185
|
+
dandruff = Dandruff.new do |config|
|
|
186
|
+
config.use_profiles = { html: true, svg: true, math_ml: true }
|
|
187
|
+
end
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
#### HTML Email Sanitization
|
|
191
|
+
|
|
192
|
+
HTML emails require special handling with legacy attributes:
|
|
193
|
+
|
|
194
|
+
```ruby
|
|
195
|
+
dandruff = Dandruff.new do |config|
|
|
196
|
+
config.use_profiles = { html_email: true }
|
|
197
|
+
end
|
|
198
|
+
|
|
199
|
+
email_html = message.html_part.body.to_s
|
|
200
|
+
safe_email = dandruff.scrub(email_html)
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
The `html_email` profile:
|
|
204
|
+
- Allows document structure tags (`head`, `meta`, `style`)
|
|
205
|
+
- Permits legacy presentation attributes (`bgcolor`, `cellpadding`, `align`, etc.)
|
|
206
|
+
- Uses per-tag attribute restrictions for security
|
|
207
|
+
- Allows style tags with content sanitization
|
|
208
|
+
- Excludes form elements and scripts
|
|
209
|
+
|
|
210
|
+
#### Per-Tag Attribute Control
|
|
211
|
+
|
|
212
|
+
Restrict which attributes are allowed on specific tags for maximum security:
|
|
213
|
+
|
|
214
|
+
```ruby
|
|
215
|
+
dandruff = Dandruff.new do |config|
|
|
216
|
+
config.allowed_attributes_per_tag = {
|
|
217
|
+
'a' => ['href', 'title', 'target'],
|
|
218
|
+
'img' => ['src', 'alt', 'width', 'height'],
|
|
219
|
+
'table' => ['border', 'cellpadding', 'cellspacing'],
|
|
220
|
+
'td' => ['colspan', 'rowspan'],
|
|
221
|
+
'th' => ['colspan', 'rowspan', 'scope']
|
|
222
|
+
}
|
|
223
|
+
end
|
|
224
|
+
|
|
225
|
+
# Only specified attributes allowed on each tag
|
|
226
|
+
html = '<a href="/page" onclick="alert()">Link</a>'
|
|
227
|
+
dandruff.scrub(html)
|
|
228
|
+
# => '<a href="/page">Link</a>' (onclick removed)
|
|
229
|
+
|
|
230
|
+
html = '<img src="pic.jpg" href="/bad">'
|
|
231
|
+
dandruff.scrub(html)
|
|
232
|
+
# => '<img src="pic.jpg">' (href removed from img)
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Security benefit:** Prevents attribute confusion attacks where dangerous attributes appear on unexpected elements.
|
|
236
|
+
|
|
237
|
+
### Complex Use Cases
|
|
238
|
+
|
|
239
|
+
#### Custom URI Validation
|
|
240
|
+
|
|
241
|
+
```ruby
|
|
242
|
+
# Only allow HTTPS URLs from your domain
|
|
243
|
+
dandruff = Dandruff.new do |config|
|
|
244
|
+
config.allowed_uri_regexp = /^https:\/\/(www\.)?example\.com\//
|
|
245
|
+
end
|
|
246
|
+
|
|
247
|
+
html = '<a href="https://example.com/safe">OK</a><a href="http://evil.com">Bad</a>'
|
|
248
|
+
dandruff.scrub(html)
|
|
249
|
+
# => '<a href="https://example.com/safe">OK</a><a>Bad</a>'
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
#### Hook-Based Customization
|
|
253
|
+
|
|
254
|
+
Hooks allow you to extend Dandruff's behavior:
|
|
255
|
+
|
|
256
|
+
```ruby
|
|
257
|
+
dandruff = Dandruff.new
|
|
258
|
+
|
|
259
|
+
# Custom attribute handling
|
|
260
|
+
dandruff.add_hook(:upon_sanitize_attribute) do |node, data, config|
|
|
261
|
+
tag_name = data[:tag_name]
|
|
262
|
+
attr_name = data[:attr_name]
|
|
263
|
+
|
|
264
|
+
# Allow specific custom data attributes
|
|
265
|
+
if attr_name.start_with?('data-safe-')
|
|
266
|
+
data[:keep_attr] = true
|
|
267
|
+
end
|
|
268
|
+
|
|
269
|
+
# Force lowercase on certain attributes
|
|
270
|
+
if attr_name == 'id'
|
|
271
|
+
node[attr_name] = node[attr_name].downcase
|
|
272
|
+
end
|
|
273
|
+
end
|
|
274
|
+
|
|
275
|
+
# Element processing
|
|
276
|
+
dandruff.add_hook(:upon_sanitize_element) do |node, data, config|
|
|
277
|
+
# Log removed elements
|
|
278
|
+
puts "Processing #{data[:tag_name]} element"
|
|
279
|
+
end
|
|
280
|
+
|
|
281
|
+
html = '<div data-safe-user-id="123" DATA-KEY="ABC" id="MyID">Content</div>'
|
|
282
|
+
dandruff.scrub(html)
|
|
283
|
+
# => '<div data-safe-user-id="123" id="myid">Content</div>'
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
Available hooks:
|
|
287
|
+
- `:before_sanitize_elements` - Before processing elements
|
|
288
|
+
- `:after_sanitize_elements` - After processing elements
|
|
289
|
+
- `:before_sanitize_attributes` - Before processing attributes on an element
|
|
290
|
+
- `:after_sanitize_attributes` - After processing attributes on an element
|
|
291
|
+
- `:upon_sanitize_element` - When processing each element
|
|
292
|
+
- `:upon_sanitize_attribute` - When processing each attribute
|
|
293
|
+
|
|
294
|
+
#### Template Safety
|
|
295
|
+
|
|
296
|
+
Remove template expressions when sanitizing user-submitted content:
|
|
297
|
+
|
|
298
|
+
```ruby
|
|
299
|
+
dandruff = Dandruff.new do |config|
|
|
300
|
+
config.safe_for_templates = true
|
|
301
|
+
end
|
|
302
|
+
|
|
303
|
+
html = '<div>{{user.name}} - <%= admin_link %> - ${secret}</div>'
|
|
304
|
+
dandruff.scrub(html)
|
|
305
|
+
# => '<div> - - </div>' (template expressions removed)
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
Removes:
|
|
309
|
+
- Mustache/Handlebars: `{{ }}`
|
|
310
|
+
- ERB: `<% %>`, `<%= %>`
|
|
311
|
+
- Template literals: `${ }`
|
|
312
|
+
|
|
313
|
+
#### Multi-Pass Sanitization
|
|
314
|
+
|
|
315
|
+
Protect against mutation-based XSS (mXSS):
|
|
316
|
+
|
|
317
|
+
```ruby
|
|
318
|
+
dandruff = Dandruff.new do |config|
|
|
319
|
+
config.scrub_until_stable = true # default
|
|
320
|
+
config.mutation_max_passes = 2 # default
|
|
321
|
+
end
|
|
322
|
+
|
|
323
|
+
# Dandruff will re-sanitize until output is stable
|
|
324
|
+
# or max passes reached, preventing mXSS attacks
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
#### Return DOM Instead of String
|
|
328
|
+
|
|
329
|
+
For further processing with Nokogiri:
|
|
330
|
+
|
|
331
|
+
```ruby
|
|
332
|
+
dandruff = Dandruff.new do |config|
|
|
333
|
+
config.return_dom = true
|
|
334
|
+
end
|
|
335
|
+
|
|
336
|
+
doc = dandruff.scrub(html)
|
|
337
|
+
# => Nokogiri::HTML::Document
|
|
338
|
+
|
|
339
|
+
# Or return a fragment
|
|
340
|
+
dandruff = Dandruff.new do |config|
|
|
341
|
+
config.return_dom_fragment = true
|
|
342
|
+
end
|
|
343
|
+
|
|
344
|
+
fragment = dandruff.scrub(html)
|
|
345
|
+
# => Nokogiri::HTML::DocumentFragment
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
## 📚 Reference
|
|
349
|
+
|
|
350
|
+
### Configuration Reference
|
|
351
|
+
|
|
352
|
+
Complete list of configuration options with defaults and security implications:
|
|
353
|
+
|
|
354
|
+
| Option | Type | Default | Description |
|
|
355
|
+
|--------|------|---------|-------------|
|
|
356
|
+
| `allowed_tags` | `Array<String>` | `nil` (use defaults) | Exact allowlist of elements. When set, only these tags pass. |
|
|
357
|
+
| `additional_tags` | `Array<String>` | `[]` | Extends default safe set. |
|
|
358
|
+
| `forbidden_tags` | `Array<String>` | `['base','link','meta','annotation-xml','noscript']` | Always removed even if allowed elsewhere. |
|
|
359
|
+
| `allowed_attributes` | `Array<String>` | `nil` (use defaults) | Exact allowlist of attributes. |
|
|
360
|
+
| `allowed_attributes_per_tag` | `Hash<String, Array<String>>` | `nil` | Per-tag attribute restrictions. Takes precedence over `allowed_attributes`. |
|
|
361
|
+
| `additional_attributes` | `Array<String>` | `[]` | Extends default safe attributes. |
|
|
362
|
+
| `forbidden_attributes` | `Array<String>` | `nil` | Attributes always removed. |
|
|
363
|
+
| `allow_data_attributes` | `Boolean` | `true` | Controls `data-*` attributes. |
|
|
364
|
+
| `allow_aria_attributes` | `Boolean` | `true` | Controls `aria-*` attributes for accessibility. |
|
|
365
|
+
| `allow_data_uri` | `Boolean` | `true` | Blocks `data:` URIs by default for safety. |
|
|
366
|
+
| `allow_unknown_protocols` | `Boolean` | `false` | If true, permits non-standard schemes (⚠️ security risk). |
|
|
367
|
+
| `allowed_uri_regexp` | `Regexp` | `nil` | Custom regexp to validate URI attributes. |
|
|
368
|
+
| `additional_uri_safe_attributes` | `Array<String>` | `[]` | Extra attributes treated as URI-like. |
|
|
369
|
+
| `allow_style_tags` | `Boolean` | `true` | `<style>` tags with content scanning. |
|
|
370
|
+
| `sanitize_dom` | `Boolean` | `true` | Removes DOM clobbering `id`/`name` values. |
|
|
371
|
+
| `safe_for_templates` | `Boolean` | `false` | Strips template expressions (`{{ }}`, `<%= %>`, `${ }`). |
|
|
372
|
+
| `safe_for_xml` | `Boolean` | `true` | Removes comments/PI in XML-ish content. |
|
|
373
|
+
| `whole_document` | `Boolean` | `false` | Parse as full document instead of fragment. |
|
|
374
|
+
| `allow_document_elements` | `Boolean` | `false` | Retain `html/head/body` tags. |
|
|
375
|
+
| `minimal_profile` | `Boolean` | `false` | Use smaller HTML-only allowlist (no SVG/MathML). |
|
|
376
|
+
| `force_body` | `Boolean` | `false` | Forces body context when parsing fragments. |
|
|
377
|
+
| `return_dom` | `Boolean` | `false` | Return Nokogiri DOM instead of string. |
|
|
378
|
+
| `return_dom_fragment` | `Boolean` | `false` | Return Nokogiri fragment instead of string. |
|
|
379
|
+
| `sanitize_until_stable` | `Boolean` | `true` | Re-sanitize until stable to mitigate mXSS. |
|
|
380
|
+
| `mutation_max_passes` | `Integer` | `2` | Max passes for stabilization. Higher = more secure, slower. |
|
|
381
|
+
| `keep_content` | `Boolean` | `true` | If false, removes contents of stripped elements. |
|
|
382
|
+
| `in_place` | `Boolean` | `false` | Attempts to sanitize in place (experimental). |
|
|
383
|
+
| `use_profiles` | `Hash` | `{}` | Enable content type profiles: `:html`, `:svg`, `:svg_filters`, `:math_ml`, `:html_email`. |
|
|
384
|
+
| `namespace` | `String` | `'http://www.w3.org/1999/xhtml'` | Namespace for XHTML handling. |
|
|
385
|
+
| `parser_media_type` | `String` | `'text/html'` | Parser media type; set to `application/xhtml+xml` for XHTML. |
|
|
386
|
+
|
|
387
|
+
### API Reference
|
|
388
|
+
|
|
389
|
+
#### Core Methods
|
|
390
|
+
|
|
391
|
+
##### `Dandruff.new(config = nil, &block)` → `Sanitizer`
|
|
392
|
+
|
|
393
|
+
Creates a new Dandruff instance.
|
|
394
|
+
|
|
395
|
+
**Parameters:**
|
|
396
|
+
- `config` (Hash, Config) - Optional configuration hash or Config object
|
|
397
|
+
- `block` - Optional block for configuration
|
|
398
|
+
|
|
399
|
+
**Returns:** Sanitizer instance
|
|
400
|
+
|
|
401
|
+
**Example:**
|
|
402
|
+
```ruby
|
|
403
|
+
dandruff = Dandruff.new do |config|
|
|
404
|
+
config.allowed_tags = ['p', 'strong']
|
|
405
|
+
end
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
##### `dandruff.scrub(dirty_html, config = {})` → `String` or `Nokogiri::XML::Document`
|
|
409
|
+
|
|
410
|
+
Sanitizes HTML string or Nokogiri node.
|
|
411
|
+
|
|
412
|
+
**Parameters:**
|
|
413
|
+
- `dirty_html` (String, Nokogiri::XML::Node) - Input to sanitize
|
|
414
|
+
- `config` (Hash) - Optional configuration override
|
|
415
|
+
|
|
416
|
+
**Returns:** Sanitized HTML string or Nokogiri document (based on config)
|
|
417
|
+
|
|
418
|
+
**Example:**
|
|
419
|
+
```ruby
|
|
420
|
+
clean = dandruff.scrub('<script>xss</script><p>Safe</p>')
|
|
421
|
+
# => "<p>Safe</p>"
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
##### `Dandruff.scrub(dirty_html, config = {})` → `String`
|
|
425
|
+
|
|
426
|
+
Class method for one-off sanitization.
|
|
427
|
+
|
|
428
|
+
**Parameters:**
|
|
429
|
+
- `dirty_html` (String) - Input to sanitize
|
|
430
|
+
- `config` (Hash) - Configuration options
|
|
431
|
+
|
|
432
|
+
**Returns:** Sanitized HTML string
|
|
433
|
+
|
|
434
|
+
**Example:**
|
|
435
|
+
```ruby
|
|
436
|
+
clean = Dandruff.scrub(html, allowed_tags: ['p'])
|
|
437
|
+
```
|
|
438
|
+
|
|
439
|
+
#### Configuration Methods
|
|
440
|
+
|
|
441
|
+
##### `dandruff.configure { |config| ... }` → `Sanitizer`
|
|
442
|
+
|
|
443
|
+
Configures the dandruff instance using a block.
|
|
444
|
+
|
|
445
|
+
**Example:**
|
|
446
|
+
```ruby
|
|
447
|
+
dandruff.configure do |config|
|
|
448
|
+
config.allowed_tags = ['p', 'strong']
|
|
449
|
+
end
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
##### `dandruff.set_config(config_hash)` → `Config`
|
|
453
|
+
|
|
454
|
+
Sets configuration directly with a hash.
|
|
455
|
+
|
|
456
|
+
**Example:**
|
|
457
|
+
```ruby
|
|
458
|
+
dandruff.set_config(allowed_tags: ['p'], allowed_attributes: ['class'])
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
##### `dandruff.clear_config` → `Config`
|
|
462
|
+
|
|
463
|
+
Resets to default configuration.
|
|
464
|
+
|
|
465
|
+
#### Hook Methods
|
|
466
|
+
|
|
467
|
+
##### `dandruff.add_hook(entry_point, &block)` → `void`
|
|
468
|
+
|
|
469
|
+
Adds a hook function.
|
|
470
|
+
|
|
471
|
+
**Parameters:**
|
|
472
|
+
- `entry_point` (Symbol) - Hook name (`:before_sanitize_elements`, `:upon_sanitize_attribute`, etc.)
|
|
473
|
+
- `block` (Proc) - Hook function receiving `(node, data, config)`
|
|
474
|
+
|
|
475
|
+
**Example:**
|
|
476
|
+
```ruby
|
|
477
|
+
dandruff.add_hook(:upon_sanitize_attribute) do |node, data, config|
|
|
478
|
+
data[:keep_attr] = true if data[:attr_name] == 'data-safe'
|
|
479
|
+
end
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
##### `dandruff.remove_hook(entry_point, hook_function = nil)` → `Proc` or `nil`
|
|
483
|
+
|
|
484
|
+
Removes specific hook or last hook for an entry point.
|
|
485
|
+
|
|
486
|
+
##### `dandruff.remove_all_hooks` → `Hash`
|
|
487
|
+
|
|
488
|
+
Removes all hooks.
|
|
489
|
+
|
|
490
|
+
#### Utility Methods
|
|
491
|
+
|
|
492
|
+
##### `dandruff.supported?` → `Boolean`
|
|
493
|
+
|
|
494
|
+
Checks if required dependencies (Nokogiri) are available.
|
|
495
|
+
|
|
496
|
+
##### `dandruff.removed` → `Array`
|
|
497
|
+
|
|
498
|
+
Gets list of elements/attributes removed during last sanitization.
|
|
499
|
+
|
|
500
|
+
**Returns:** Array of removal records
|
|
501
|
+
|
|
502
|
+
### Profiles Reference
|
|
503
|
+
|
|
504
|
+
#### HTML Profile
|
|
505
|
+
|
|
506
|
+
**Enable:** `use_profiles: { html: true }`
|
|
507
|
+
|
|
508
|
+
**Includes:** All standard HTML5 semantic elements, media elements, form controls, and text formatting.
|
|
509
|
+
|
|
510
|
+
**Use for:** Standard web content, blog posts, documentation
|
|
511
|
+
|
|
512
|
+
```ruby
|
|
513
|
+
dandruff = Dandruff.new do |config|
|
|
514
|
+
config.use_profiles = { html: true }
|
|
515
|
+
end
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
#### SVG Profile
|
|
519
|
+
|
|
520
|
+
**Enable:** `use_profiles: { svg: true }`
|
|
521
|
+
|
|
522
|
+
**Includes:** SVG elements for vector graphics (shapes, paths, gradients, basic filters)
|
|
523
|
+
|
|
524
|
+
**Use for:** Inline SVG graphics, icons, diagrams
|
|
525
|
+
|
|
526
|
+
```ruby
|
|
527
|
+
dandruff = Dandruff.new do |config|
|
|
528
|
+
config.use_profiles = { html: true, svg: true }
|
|
529
|
+
end
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
#### SVG Filters Profile
|
|
533
|
+
|
|
534
|
+
**Enable:** `use_profiles: { svg_filters: true }`
|
|
535
|
+
|
|
536
|
+
**Includes:** Advanced SVG filter primitives (blur, color manipulation, lighting)
|
|
537
|
+
|
|
538
|
+
**Use for:** SVG with visual effects
|
|
539
|
+
|
|
540
|
+
```ruby
|
|
541
|
+
dandruff = Dandruff.new do |config|
|
|
542
|
+
config.use_profiles = { svg: true, svg_filters: true }
|
|
543
|
+
end
|
|
544
|
+
```
|
|
545
|
+
|
|
546
|
+
#### MathML Profile
|
|
547
|
+
|
|
548
|
+
**Enable:** `use_profiles: { math_ml: true }`
|
|
549
|
+
|
|
550
|
+
**Includes:** MathML elements for mathematical notation
|
|
551
|
+
|
|
552
|
+
**Use for:** Scientific documents, mathematical content
|
|
553
|
+
|
|
554
|
+
```ruby
|
|
555
|
+
dandruff = Dandruff.new do |config|
|
|
556
|
+
config.use_profiles = { html: true, math_ml: true }
|
|
557
|
+
end
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
#### HTML Email Profile
|
|
561
|
+
|
|
562
|
+
**Enable:** `use_profiles: { html_email: true }`
|
|
563
|
+
|
|
564
|
+
**Includes:**
|
|
565
|
+
- HTML elements + document structure (`head`, `meta`, `style`)
|
|
566
|
+
- Legacy presentation tags (`font`, `center`)
|
|
567
|
+
- Legacy attributes (`bgcolor`, `cellpadding`, `valign`, etc.)
|
|
568
|
+
- Per-tag attribute restrictions (automatic)
|
|
569
|
+
|
|
570
|
+
**Excludes:** Forms, scripts, interactive elements
|
|
571
|
+
|
|
572
|
+
**Special settings:**
|
|
573
|
+
- Allows style tags (required for email)
|
|
574
|
+
- Disables DOM clobbering protection (emails are sandboxed)
|
|
575
|
+
- Parses as whole document
|
|
576
|
+
|
|
577
|
+
**Use for:** HTML email rendering
|
|
578
|
+
|
|
579
|
+
```ruby
|
|
580
|
+
dandruff = Dandruff.new do |config|
|
|
581
|
+
config.use_profiles = { html_email: true }
|
|
582
|
+
end
|
|
583
|
+
```
|
|
584
|
+
|
|
585
|
+
## 🔒 Security
|
|
586
|
+
|
|
587
|
+
### Threat Model
|
|
588
|
+
|
|
589
|
+
Dandruff defends against multiple attack vectors:
|
|
590
|
+
|
|
591
|
+
#### XSS (Cross-Site Scripting)
|
|
592
|
+
|
|
593
|
+
**Attack:** Injecting scripts via HTML tags or attributes
|
|
594
|
+
|
|
595
|
+
**Protection:**
|
|
596
|
+
- Removes `<script>`, `<iframe>`, `<object>`, `<embed>` tags
|
|
597
|
+
- Blocks event handlers (`onclick`, `onerror`, `onload`, etc.)
|
|
598
|
+
- Validates URI attributes to prevent `javascript:` and `vbscript:` protocols
|
|
599
|
+
|
|
600
|
+
```ruby
|
|
601
|
+
# Attack blocked
|
|
602
|
+
dandruff.scrub('<script>alert("xss")</script>')
|
|
603
|
+
# => ""
|
|
604
|
+
|
|
605
|
+
dandruff.scrub('<img src="javascript:alert(1)">')
|
|
606
|
+
# => "<img>"
|
|
607
|
+
|
|
608
|
+
dandruff.scrub('<a onclick="alert(1)">Click</a>')
|
|
609
|
+
# => "<a>Click</a>"
|
|
610
|
+
```
|
|
611
|
+
|
|
612
|
+
#### mXSS (Mutation-Based XSS)
|
|
613
|
+
|
|
614
|
+
**Attack:** HTML mutations during parsing that create XSS
|
|
615
|
+
|
|
616
|
+
**Protection:**
|
|
617
|
+
- Multi-pass sanitization (validates output is stable)
|
|
618
|
+
- Namespace confusion prevention (SVG/MathML)
|
|
619
|
+
- Proper HTML5 parsing
|
|
620
|
+
|
|
621
|
+
```ruby
|
|
622
|
+
# mXSS prevented through multi-pass sanitization
|
|
623
|
+
dandruff = Dandruff.new do |config|
|
|
624
|
+
config.scrub_until_stable = true # default
|
|
625
|
+
config.mutation_max_passes = 2
|
|
626
|
+
end
|
|
627
|
+
```
|
|
628
|
+
|
|
629
|
+
#### DOM Clobbering
|
|
630
|
+
|
|
631
|
+
**Attack:** Using `id`/`name` attributes to override built-in DOM properties
|
|
632
|
+
|
|
633
|
+
**Protection:**
|
|
634
|
+
- Blocks dangerous id/name values (`document`, `location`, `alert`, `window`, etc.)
|
|
635
|
+
- Can be disabled for sandboxed contexts like email
|
|
636
|
+
|
|
637
|
+
```ruby
|
|
638
|
+
# DOM clobbering blocked
|
|
639
|
+
dandruff.scrub('<form name="document">')
|
|
640
|
+
# => "<form></form>" (name removed)
|
|
641
|
+
|
|
642
|
+
dandruff.scrub('<img id="location">')
|
|
643
|
+
# => "<img>" (id removed)
|
|
644
|
+
```
|
|
645
|
+
|
|
646
|
+
#### Protocol Injection
|
|
647
|
+
|
|
648
|
+
**Attack:** Using dangerous URI protocols to execute code
|
|
649
|
+
|
|
650
|
+
**Protection:**
|
|
651
|
+
- Blocks `javascript:`, `vbscript:`, `data:text/html` protocols
|
|
652
|
+
- Validates against allowlist of safe protocols
|
|
653
|
+
- Custom protocol validation with `allowed_uri_regexp`
|
|
654
|
+
|
|
655
|
+
```ruby
|
|
656
|
+
dandruff.scrub('<a href="javascript:alert(1)">Click</a>')
|
|
657
|
+
# => "<a>Click</a>"
|
|
658
|
+
|
|
659
|
+
dandruff.scrub('<link href="vbscript:msgbox(1)">')
|
|
660
|
+
# => (link removed)
|
|
661
|
+
```
|
|
662
|
+
|
|
663
|
+
#### CSS Injection
|
|
664
|
+
|
|
665
|
+
**Attack:** Using CSS to execute code or exfiltrate data
|
|
666
|
+
|
|
667
|
+
**Protection:**
|
|
668
|
+
- Parses and validates inline `style` attributes
|
|
669
|
+
- Removes dangerous CSS properties and values
|
|
670
|
+
- Scans `<style>` tag content for unsafe patterns
|
|
671
|
+
|
|
672
|
+
```ruby
|
|
673
|
+
dandruff.scrub('<div style="expression(alert(1))"></div>')
|
|
674
|
+
# => "<div></div>" (dangerous style removed)
|
|
675
|
+
|
|
676
|
+
dandruff.scrub('<div style="background: url(javascript:alert(1))"></div>')
|
|
677
|
+
# => "<div></div>" (dangerous style removed)
|
|
678
|
+
```
|
|
679
|
+
|
|
680
|
+
### Security Best Practices
|
|
681
|
+
|
|
682
|
+
#### 1. Use Allowlists, Not Blocklists
|
|
683
|
+
|
|
684
|
+
```ruby
|
|
685
|
+
# ✅ Good - explicitly allow safe tags
|
|
686
|
+
config.allowed_tags = ['p', 'strong', 'em', 'a']
|
|
687
|
+
|
|
688
|
+
# ❌ Avoid - trying to block everything dangerous is error-prone
|
|
689
|
+
config.forbidden_tags = ['script', 'iframe', ...] # incomplete!
|
|
690
|
+
```
|
|
691
|
+
|
|
692
|
+
#### 2. Restrict URI Protocols
|
|
693
|
+
|
|
694
|
+
```ruby
|
|
695
|
+
# ✅ Good - only allow HTTPS
|
|
696
|
+
config.allowed_uri_regexp = /^https:/
|
|
697
|
+
|
|
698
|
+
# ⚠️ Caution - allowing unknown protocols is risky
|
|
699
|
+
config.allow_unknown_protocols = true # avoid unless necessary
|
|
700
|
+
```
|
|
701
|
+
|
|
702
|
+
#### 3. Disable Data URIs Unless Needed
|
|
703
|
+
|
|
704
|
+
```ruby
|
|
705
|
+
# ✅ Good for user-generated content
|
|
706
|
+
config.allow_data_uri = false
|
|
707
|
+
|
|
708
|
+
# ⚠️ Only enable for trusted content
|
|
709
|
+
config.allow_data_uri = true # only if you need it
|
|
710
|
+
```
|
|
711
|
+
|
|
712
|
+
#### 4. Remove Event Handlers
|
|
713
|
+
|
|
714
|
+
```ruby
|
|
715
|
+
# ✅ Good - block all event handlers
|
|
716
|
+
config.forbidden_attributes = [
|
|
717
|
+
'onclick', 'onload', 'onerror', 'onmouseover',
|
|
718
|
+
'onfocus', 'onblur', 'onchange', 'onsubmit'
|
|
719
|
+
]
|
|
720
|
+
```
|
|
721
|
+
|
|
722
|
+
#### 5. Keep DOM Sanitization Enabled
|
|
723
|
+
|
|
724
|
+
```ruby
|
|
725
|
+
# ✅ Good - default setting
|
|
726
|
+
config.scrub_dom = true
|
|
727
|
+
|
|
728
|
+
# ⚠️ Only disable for sandboxed contexts (e.g., email rendering)
|
|
729
|
+
config.scrub_dom = false # use with caution
|
|
730
|
+
```
|
|
731
|
+
|
|
732
|
+
#### 6. Use Per-Tag Attribute Control
|
|
733
|
+
|
|
734
|
+
```ruby
|
|
735
|
+
# ✅ Good - prevents attribute confusion
|
|
736
|
+
config.allowed_attributes_per_tag = {
|
|
737
|
+
'a' => ['href', 'title'], # no 'src' on links
|
|
738
|
+
'img' => ['src', 'alt'], # no 'href' on images
|
|
739
|
+
'form' => ['action', 'method'] # only form-specific attrs
|
|
740
|
+
}
|
|
741
|
+
```
|
|
742
|
+
|
|
743
|
+
#### 7. Keep Dandruff Updated
|
|
744
|
+
|
|
745
|
+
```ruby
|
|
746
|
+
# Check your Gemfile.lock regularly
|
|
747
|
+
bundle outdated dandruff
|
|
748
|
+
|
|
749
|
+
# Update to latest version
|
|
750
|
+
bundle safe update dandruff
|
|
751
|
+
```
|
|
752
|
+
|
|
753
|
+
### Recommended Configurations
|
|
754
|
+
|
|
755
|
+
#### Maximum Security (User Comments)
|
|
756
|
+
|
|
757
|
+
```ruby
|
|
758
|
+
dandruff = Dandruff.new do |config|
|
|
759
|
+
config.allowed_tags = ['p', 'br', 'strong', 'em', 'a']
|
|
760
|
+
config.allowed_attributes = ['href']
|
|
761
|
+
config.forbidden_attributes = ['onclick', 'onerror', 'onload', 'style']
|
|
762
|
+
config.allow_data_uri = false
|
|
763
|
+
config.keep_content = false
|
|
764
|
+
end
|
|
765
|
+
```
|
|
766
|
+
|
|
767
|
+
#### Content Management System
|
|
768
|
+
|
|
769
|
+
```ruby
|
|
770
|
+
dandruff = Dandruff.new do |config|
|
|
771
|
+
config.allowed_tags = [
|
|
772
|
+
'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
|
|
773
|
+
'p', 'br', 'strong', 'em', 'ul', 'ol', 'li',
|
|
774
|
+
'blockquote', 'code', 'pre', 'a', 'img',
|
|
775
|
+
'div', 'span', 'table', 'tr', 'td', 'th'
|
|
776
|
+
]
|
|
777
|
+
config.allowed_attributes = ['href', 'src', 'alt', 'title', 'class', 'id']
|
|
778
|
+
config.allow_data_uri = true # for embedded images
|
|
779
|
+
config.allowed_uri_regexp = /^(?:https?:|\/)/ # only https and relative
|
|
780
|
+
end
|
|
781
|
+
```
|
|
782
|
+
|
|
783
|
+
#### Rich Text Editor
|
|
784
|
+
|
|
785
|
+
```ruby
|
|
786
|
+
dandruff = Dandruff.new do |config|
|
|
787
|
+
config.use_profiles = { html: true }
|
|
788
|
+
config.forbidden_tags = ['script', 'iframe', 'object', 'embed']
|
|
789
|
+
config.forbidden_attributes = ['on*'] # remove all event handlers
|
|
790
|
+
config.allowed_attributes_per_tag = {
|
|
791
|
+
'img' => ['src', 'alt', 'width', 'height'],
|
|
792
|
+
'a' => ['href', 'title']
|
|
793
|
+
}
|
|
794
|
+
end
|
|
795
|
+
```
|
|
796
|
+
|
|
797
|
+
## ⚡ Performance
|
|
798
|
+
|
|
799
|
+
Dandruff is optimized for performance while maintaining security.
|
|
800
|
+
|
|
801
|
+
### Benchmarks
|
|
802
|
+
|
|
803
|
+
Executed on Apple M1 Max via `ruby spec/dandruff_performance_spec.rb`:
|
|
804
|
+
|
|
805
|
+
| Input Size | Default Config | Strict Config | Throughput (Default) | Throughput (Strict) |
|
|
806
|
+
|------------|----------------|---------------|----------------------|---------------------|
|
|
807
|
+
| 1KB | ~3.3ms | ~0.3ms | ~300 KB/s | ~3,300 KB/s |
|
|
808
|
+
| 10KB | ~31ms | ~3.3ms | ~320 KB/s | ~3,000 KB/s |
|
|
809
|
+
| 50KB | ~160ms | ~16ms | ~310 KB/s | ~3,100 KB/s |
|
|
810
|
+
| 100KB | ~340ms | ~31ms | ~290 KB/s | ~3,200 KB/s |
|
|
811
|
+
| 500KB | ~1.8s | ~170ms | ~280 KB/s | ~2,900 KB/s |
|
|
812
|
+
|
|
813
|
+
**Stress tests:**
|
|
814
|
+
- 1,000 small docs: ~0.40s total (~2,450 docs/sec)
|
|
815
|
+
- Deep nesting (100 levels): <5s
|
|
816
|
+
- Memory growth: <50k objects over 100 iterations
|
|
817
|
+
|
|
818
|
+
### Performance Tips
|
|
819
|
+
|
|
820
|
+
#### 1. Reuse Configurations
|
|
821
|
+
|
|
822
|
+
```ruby
|
|
823
|
+
# ✅ Good - reuse configuration
|
|
824
|
+
dandruff = Dandruff.new do |config|
|
|
825
|
+
config.allowed_tags = ['p', 'strong', 'em']
|
|
826
|
+
end
|
|
827
|
+
|
|
828
|
+
documents.each do |doc|
|
|
829
|
+
clean = dandruff.scrub(doc) # fast - config already set
|
|
830
|
+
end
|
|
831
|
+
|
|
832
|
+
# ❌ Slower - new config each time
|
|
833
|
+
documents.each do |doc|
|
|
834
|
+
clean = Dandruff.scrub(doc, allowed_tags: ['p', 'strong', 'em'])
|
|
835
|
+
end
|
|
836
|
+
```
|
|
837
|
+
|
|
838
|
+
#### 2. Use Strict Configurations
|
|
839
|
+
|
|
840
|
+
More restrictive configurations are faster:
|
|
841
|
+
|
|
842
|
+
```ruby
|
|
843
|
+
# Faster - small allowlist
|
|
844
|
+
config.allowed_tags = ['p', 'strong', 'em']
|
|
845
|
+
|
|
846
|
+
# Slower - large allowlist or nil (uses defaults)
|
|
847
|
+
config.allowed_tags = nil
|
|
848
|
+
```
|
|
849
|
+
|
|
850
|
+
#### 3. Batch Processing
|
|
851
|
+
|
|
852
|
+
Process multiple documents with the same instance:
|
|
853
|
+
|
|
854
|
+
```ruby
|
|
855
|
+
dandruff = Dandruff.new do |config|
|
|
856
|
+
# ... configuration
|
|
857
|
+
end
|
|
858
|
+
|
|
859
|
+
cleaned_docs = documents.map { |doc| dandruff.scrub(doc) }
|
|
860
|
+
```
|
|
861
|
+
|
|
862
|
+
#### 4. Adjust Multi-Pass Limit
|
|
863
|
+
|
|
864
|
+
For trusted content, you can reduce passes:
|
|
865
|
+
|
|
866
|
+
```ruby
|
|
867
|
+
# Faster but less secure - use only for pre-validated content
|
|
868
|
+
config.scrub_until_stable = false
|
|
869
|
+
|
|
870
|
+
# Or reduce max passes
|
|
871
|
+
config.mutation_max_passes = 1 # default is 2
|
|
872
|
+
```
|
|
873
|
+
|
|
874
|
+
#### 5. Return DOM for Further Processing
|
|
875
|
+
|
|
876
|
+
If you need to process the output further:
|
|
877
|
+
|
|
878
|
+
```ruby
|
|
879
|
+
config.return_dom = true
|
|
880
|
+
doc = dandruff.scrub(html) # Returns Nokogiri document
|
|
881
|
+
# ... further processing with Nokogiri
|
|
882
|
+
```
|
|
883
|
+
|
|
884
|
+
## 🔄 Migration Guides
|
|
885
|
+
|
|
886
|
+
### From Rails Sanitizer
|
|
887
|
+
|
|
888
|
+
```ruby
|
|
889
|
+
# Before (Rails)
|
|
890
|
+
ActionController::Base.helpers.scrub(html, tags: ['p', 'strong'])
|
|
891
|
+
|
|
892
|
+
# After (Dandruff)
|
|
893
|
+
Dandruff.scrub(html, allowed_tags: ['p', 'strong'])
|
|
894
|
+
|
|
895
|
+
# Or create reusable instance
|
|
896
|
+
@dandruff = Dandruff.new do |config|
|
|
897
|
+
config.allowed_tags = ['p', 'strong', 'em', 'a']
|
|
898
|
+
config.allowed_attributes = ['href']
|
|
899
|
+
end
|
|
900
|
+
|
|
901
|
+
@dandruff.scrub(html)
|
|
902
|
+
```
|
|
903
|
+
|
|
904
|
+
### From Loofah
|
|
905
|
+
|
|
906
|
+
```ruby
|
|
907
|
+
# Before (Loofah)
|
|
908
|
+
Loofah.fragment(html).scrub!(:prune).to_s
|
|
909
|
+
|
|
910
|
+
# After (Dandruff)
|
|
911
|
+
Dandruff.scrub(html, keep_content: false)
|
|
912
|
+
|
|
913
|
+
# With specific tags
|
|
914
|
+
Loofah.fragment(html).scrub!(:strip).to_s
|
|
915
|
+
|
|
916
|
+
# Dandruff equivalent
|
|
917
|
+
Dandruff.scrub(html, allowed_tags: ['p', 'strong'])
|
|
918
|
+
```
|
|
919
|
+
|
|
920
|
+
### From Sanitize Gem
|
|
921
|
+
|
|
922
|
+
```ruby
|
|
923
|
+
# Before (Sanitize)
|
|
924
|
+
Sanitize.fragment(html, elements: ['p', 'strong'])
|
|
925
|
+
|
|
926
|
+
# After (Dandruff)
|
|
927
|
+
Dandruff.scrub(html, allowed_tags: ['p', 'strong'])
|
|
928
|
+
|
|
929
|
+
# Custom config
|
|
930
|
+
Sanitize.fragment(html, Sanitize::Config::RELAXED)
|
|
931
|
+
|
|
932
|
+
# Dandruff profiles
|
|
933
|
+
Dandruff.scrub(html) do |config|
|
|
934
|
+
config.use_profiles = { html: true }
|
|
935
|
+
end
|
|
936
|
+
```
|
|
937
|
+
|
|
938
|
+
## 🆚 Comparison
|
|
939
|
+
|
|
940
|
+
See [COMPARISON.md](COMPARISON.md) for a detailed comparison with other Ruby HTML sanitization libraries:
|
|
941
|
+
|
|
942
|
+
- Rails' built-in sanitizer
|
|
943
|
+
- Loofah
|
|
944
|
+
- Sanitize gem
|
|
945
|
+
|
|
946
|
+
**Key differentiators:**
|
|
947
|
+
- Based on DOMPurify's proven security model
|
|
948
|
+
- Protection against mXSS attacks
|
|
949
|
+
- DOM clobbering prevention
|
|
950
|
+
- Per-tag attribute control
|
|
951
|
+
- Hook system for extensibility
|
|
952
|
+
- HTML email support with per-tag restrictions
|
|
953
|
+
|
|
954
|
+
## ❓ FAQ
|
|
955
|
+
|
|
956
|
+
### How is Dandruff different from other sanitizers?
|
|
957
|
+
|
|
958
|
+
Dandruff brings DOMPurify's battle-tested security model to Ruby, with specific defenses against mXSS, DOM clobbering, and protocol injection that other Ruby sanitizers may not provide. It also offers per-tag attribute control and an extensible hook system.
|
|
959
|
+
|
|
960
|
+
### Is Dandruff safe for user-generated content?
|
|
961
|
+
|
|
962
|
+
Yes! Dandruff is specifically designed for sanitizing untrusted user input. Use restrictive configurations for maximum security (see [Recommended Configurations](#recommended-configurations)).
|
|
963
|
+
|
|
964
|
+
### Can I use Dandruff with Rails?
|
|
965
|
+
|
|
966
|
+
Absolutely! Dandruff works great with Rails:
|
|
967
|
+
|
|
968
|
+
```ruby
|
|
969
|
+
# In your helper
|
|
970
|
+
def sanitize_user_content(html)
|
|
971
|
+
@dandruff ||= Dandruff.new do |config|
|
|
972
|
+
config.allowed_tags = ['p', 'strong', 'em', 'a']
|
|
973
|
+
config.allowed_attributes = ['href']
|
|
974
|
+
end
|
|
975
|
+
@dandruff.scrub(html)
|
|
976
|
+
end
|
|
977
|
+
```
|
|
978
|
+
|
|
979
|
+
### Does Dandruff work with HTML emails?
|
|
980
|
+
|
|
981
|
+
Yes! Use the `html_email` profile:
|
|
982
|
+
|
|
983
|
+
```ruby
|
|
984
|
+
dandruff = Dandruff.new do |config|
|
|
985
|
+
config.use_profiles = { html_email: true }
|
|
986
|
+
end
|
|
987
|
+
```
|
|
988
|
+
|
|
989
|
+
This includes legacy attributes and per-tag restrictions needed for email clients.
|
|
990
|
+
|
|
991
|
+
### What about performance?
|
|
992
|
+
|
|
993
|
+
Dandruff processes ~300 KB/s with default config and ~3,000 KB/s with strict config on modern hardware. Reuse configuration instances for best performance. See [Performance](#performance) section.
|
|
994
|
+
|
|
995
|
+
### How do I allow custom elements?
|
|
996
|
+
|
|
997
|
+
```ruby
|
|
998
|
+
dandruff = Dandruff.new do |config|
|
|
999
|
+
config.additional_tags = ['my-custom-element', 'web-component']
|
|
1000
|
+
end
|
|
1001
|
+
```
|
|
1002
|
+
|
|
1003
|
+
Elements with hyphens are treated as custom elements by default.
|
|
1004
|
+
|
|
1005
|
+
### Can I allow inline styles?
|
|
1006
|
+
|
|
1007
|
+
Yes, but they're sanitized for safety:
|
|
1008
|
+
|
|
1009
|
+
```ruby
|
|
1010
|
+
dandruff = Dandruff.new do |config|
|
|
1011
|
+
config.allowed_attributes = ['style'] # style is allowed by default
|
|
1012
|
+
end
|
|
1013
|
+
|
|
1014
|
+
# Safe styles pass through
|
|
1015
|
+
dandruff.scrub('<div style="color: red;">Text</div>')
|
|
1016
|
+
# => '<div style="color:red;">Text</div>'
|
|
1017
|
+
|
|
1018
|
+
# Dangerous styles are removed
|
|
1019
|
+
dandruff.scrub('<div style="expression(alert(1))">Text</div>')
|
|
1020
|
+
# => '<div>Text</div>'
|
|
1021
|
+
```
|
|
1022
|
+
|
|
1023
|
+
### How do I debug what's being removed?
|
|
1024
|
+
|
|
1025
|
+
```ruby
|
|
1026
|
+
dandruff = Dandruff.new
|
|
1027
|
+
dandruff.scrub(html)
|
|
1028
|
+
|
|
1029
|
+
# Check what was removed
|
|
1030
|
+
removed = dandruff.removed
|
|
1031
|
+
removed.each do |item|
|
|
1032
|
+
if item[:element]
|
|
1033
|
+
puts "Removed element: #{item[:element].name}"
|
|
1034
|
+
elsif item[:attribute]
|
|
1035
|
+
puts "Removed attribute: #{item[:attribute].name} from #{item[:from].name}"
|
|
1036
|
+
end
|
|
1037
|
+
end
|
|
1038
|
+
```
|
|
1039
|
+
|
|
1040
|
+
## 🛠️ Troubleshooting
|
|
1041
|
+
|
|
1042
|
+
### Content is being removed unexpectedly
|
|
1043
|
+
|
|
1044
|
+
**Check your configuration:**
|
|
1045
|
+
|
|
1046
|
+
```ruby
|
|
1047
|
+
# Enable keep_content to preserve text
|
|
1048
|
+
config.keep_content = true
|
|
1049
|
+
|
|
1050
|
+
# Check if tags are in your allowlist
|
|
1051
|
+
puts dandruff.config.allowed_tags
|
|
1052
|
+
|
|
1053
|
+
# Use additional_tags instead of allowed_tags to extend defaults
|
|
1054
|
+
config.additional_tags = ['custom-tag'] # instead of replacing all
|
|
1055
|
+
```
|
|
1056
|
+
|
|
1057
|
+
### Attributes are being stripped
|
|
1058
|
+
|
|
1059
|
+
**Verify attribute configuration:**
|
|
1060
|
+
|
|
1061
|
+
```ruby
|
|
1062
|
+
# Check which attributes are allowed
|
|
1063
|
+
puts dandruff.config.allowed_attributes
|
|
1064
|
+
|
|
1065
|
+
# Use additional_attributes to extend
|
|
1066
|
+
config.additional_attributes = ['data-custom']
|
|
1067
|
+
|
|
1068
|
+
# Or use per-tag control
|
|
1069
|
+
config.allowed_attributes_per_tag = {
|
|
1070
|
+
'div' => ['class', 'id', 'data-custom']
|
|
1071
|
+
}
|
|
1072
|
+
```
|
|
1073
|
+
|
|
1074
|
+
### Style tags are removed
|
|
1075
|
+
|
|
1076
|
+
**Enable style tags:**
|
|
1077
|
+
|
|
1078
|
+
```ruby
|
|
1079
|
+
config.allow_style_tags = true
|
|
1080
|
+
|
|
1081
|
+
# For whole documents (like emails)
|
|
1082
|
+
config.whole_document = true
|
|
1083
|
+
```
|
|
1084
|
+
|
|
1085
|
+
### URI validation is too strict
|
|
1086
|
+
|
|
1087
|
+
**Customize URI validation:**
|
|
1088
|
+
|
|
1089
|
+
```ruby
|
|
1090
|
+
# Allow more protocols
|
|
1091
|
+
config.allowed_uri_regexp = /^(?:https?|ftp|mailto):/
|
|
1092
|
+
|
|
1093
|
+
# Or allow unknown protocols (⚠️ less secure)
|
|
1094
|
+
config.allow_unknown_protocols = true
|
|
1095
|
+
```
|
|
1096
|
+
|
|
1097
|
+
### Performance is slow
|
|
1098
|
+
|
|
1099
|
+
**Optimize configuration:**
|
|
1100
|
+
|
|
1101
|
+
```ruby
|
|
1102
|
+
# Use specific allowlists
|
|
1103
|
+
config.allowed_tags = ['p', 'strong', 'em'] # faster than nil/defaults
|
|
1104
|
+
|
|
1105
|
+
# Reduce multi-pass iterations for trusted content
|
|
1106
|
+
config.mutation_max_passes = 1 # default is 2
|
|
1107
|
+
|
|
1108
|
+
# Disable multi-pass for pre-validated content
|
|
1109
|
+
config.scrub_until_stable = false # use with caution
|
|
1110
|
+
```
|
|
1111
|
+
|
|
1112
|
+
## 🤝 Contributing
|
|
1113
|
+
|
|
1114
|
+
We welcome contributions! Here's how to get involved:
|
|
1115
|
+
|
|
1116
|
+
### Development Setup
|
|
1117
|
+
|
|
1118
|
+
```bash
|
|
1119
|
+
# Clone the repository
|
|
1120
|
+
git clone https://github.com/kuyio/dandruff.git
|
|
1121
|
+
cd dandruff
|
|
1122
|
+
|
|
1123
|
+
# Install dependencies
|
|
1124
|
+
bundle install
|
|
1125
|
+
|
|
1126
|
+
# Run tests
|
|
1127
|
+
make test
|
|
1128
|
+
|
|
1129
|
+
# Run linter
|
|
1130
|
+
make lint
|
|
1131
|
+
|
|
1132
|
+
# Open console
|
|
1133
|
+
bin/console
|
|
1134
|
+
```
|
|
1135
|
+
|
|
1136
|
+
### Running Tests
|
|
1137
|
+
|
|
1138
|
+
```bash
|
|
1139
|
+
# All tests
|
|
1140
|
+
rake spec
|
|
1141
|
+
|
|
1142
|
+
# With coverage
|
|
1143
|
+
COVERAGE=true rake spec
|
|
1144
|
+
|
|
1145
|
+
# Specific test file
|
|
1146
|
+
rspec spec/basic_sanitization_spec.rb
|
|
1147
|
+
|
|
1148
|
+
# Performance tests
|
|
1149
|
+
ruby spec/dandruff_performance_spec.rb
|
|
1150
|
+
```
|
|
1151
|
+
|
|
1152
|
+
### Contribution Guidelines
|
|
1153
|
+
|
|
1154
|
+
1. **Fork** the repository
|
|
1155
|
+
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
|
|
1156
|
+
3. **Write** tests for your changes
|
|
1157
|
+
4. **Ensure** all tests pass (`rake spec`)
|
|
1158
|
+
5. **Update** documentation as needed
|
|
1159
|
+
6. **Commit** your changes (`git commit -am 'Add amazing feature'`)
|
|
1160
|
+
7. **Push** to the branch (`git push origin feature/amazing-feature`)
|
|
1161
|
+
8. **Open** a Pull Request
|
|
1162
|
+
|
|
1163
|
+
### Development Guidelines
|
|
1164
|
+
|
|
1165
|
+
- **Security First**: All changes must maintain or improve security
|
|
1166
|
+
- **Backward Compatibility**: Avoid breaking changes when possible
|
|
1167
|
+
- **Comprehensive Tests**: New features need full test coverage (aim for 100%)
|
|
1168
|
+
- **Documentation**: Update README and inline YARD docs for API changes
|
|
1169
|
+
- **Performance**: Consider performance impact of changes
|
|
1170
|
+
- **Code Quality**: Follow Ruby best practices and existing code style
|
|
1171
|
+
|
|
1172
|
+
### Reporting Issues
|
|
1173
|
+
|
|
1174
|
+
Found a bug or have a feature request?
|
|
1175
|
+
|
|
1176
|
+
1. **Search** existing issues to avoid duplicates
|
|
1177
|
+
2. **Include** relevant details:
|
|
1178
|
+
- Ruby version
|
|
1179
|
+
- Dandruff version
|
|
1180
|
+
- Minimal reproduction code
|
|
1181
|
+
- Expected vs. actual behavior
|
|
1182
|
+
3. **Security issues**: Email security@kuyio.com instead of filing public issues
|
|
1183
|
+
|
|
1184
|
+
## 📄 License
|
|
1185
|
+
|
|
1186
|
+
This gem is available as open source under the terms of the **MIT License**.
|
|
1187
|
+
|
|
1188
|
+
## 🙏 Acknowledgments
|
|
1189
|
+
|
|
1190
|
+
Originally inspired by the excellent [DOMPurify](https://github.com/cure53/DOMPurify) JavaScript library by Cure53 and contributors. Dandruff brings DOMPurify's battle-tested security model to the Ruby ecosystem with an idiomatic Ruby API.
|
|
1191
|
+
|
|
1192
|
+
Special thanks to all [contributors](https://github.com/kuyio/dandruff/graphs/contributors) who have helped make Dandruff better!
|
|
1193
|
+
|
|
1194
|
+
---
|
|
1195
|
+
|
|
1196
|
+
**Made with ❤️ in Ottawa, Canada 🇨🇦** • [GitHub](https://github.com/kuyio/dandruff) • [Documentation](https://rubydoc.info/gems/dandruff)
|