rails-html-sanitizer 1.4.4 → 1.6.0.rc1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +66 -0
- data/MIT-LICENSE +1 -1
- data/README.md +150 -30
- data/lib/rails/html/sanitizer/version.rb +4 -2
- data/lib/rails/html/sanitizer.rb +367 -104
- data/lib/rails/html/scrubbers.rb +74 -72
- data/lib/rails-html-sanitizer.rb +7 -23
- data/test/rails_api_test.rb +74 -0
- data/test/sanitizer_test.rb +901 -585
- data/test/scrubbers_test.rb +57 -30
- metadata +21 -65
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 369872075a1b555eb1dbcdf744e8d9f01aa4ba4c8f29449ba61668da5c4063ff
|
4
|
+
data.tar.gz: 1ae0e8e36e37c51687c965c33d55c1a1eaaab9d4e71d089378ee62fc340e0cd1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f8c948ee3f76bb85018a3491d97f89b2957247f2cae35b650ee8d1682d482377e76e2150bbf8a81a9a1aaea4384af321c36c9a621c0c1a71a5dd079cb482a144
|
7
|
+
data.tar.gz: 070f318bcdfb024310b59fc8ceec848c937e0d7e5c4824c40cbb80a9b783e96d98b3f8f67a19630f6fe26aaee35769df84e24aefb198b58a0b06f825a18259a4
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,62 @@
|
|
1
|
+
## 1.6.0.rc1 / 2023-05-24
|
2
|
+
|
3
|
+
* Sanitizers that use an HTML5 parser are now available on platforms supported by
|
4
|
+
Nokogiri::HTML5. These are available as:
|
5
|
+
|
6
|
+
- `Rails::HTML5::FullSanitizer`
|
7
|
+
- `Rails::HTML5::LinkSanitizer`
|
8
|
+
- `Rails::HTML5::SafeListSanitizer`
|
9
|
+
|
10
|
+
And a new "vendor" is provided at `Rails::HTML5::Sanitizer` that can be used in a future version
|
11
|
+
of Rails.
|
12
|
+
|
13
|
+
Note that for symmetry `Rails::HTML4::Sanitizer` is also added, though its behavior is identical
|
14
|
+
to the vendor class methods on `Rails::HTML::Sanitizer`.
|
15
|
+
|
16
|
+
*Mike Dalessio*
|
17
|
+
|
18
|
+
* Module namespaces have changed, but backwards compatibility is provided by aliases.
|
19
|
+
|
20
|
+
The library defines three additional modules:
|
21
|
+
|
22
|
+
- `Rails::HTML` for general functionality (replacing `Rails::Html`)
|
23
|
+
- `Rails::HTML4` containing sanitizers that parse content as HTML4
|
24
|
+
- `Rails::HTML5` containing sanitizers that parse content as HTML5
|
25
|
+
|
26
|
+
The following aliases are maintained for backwards compatibility:
|
27
|
+
|
28
|
+
- `Rails::Html` points to `Rails::HTML`
|
29
|
+
- `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
|
30
|
+
- `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
|
31
|
+
- `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
|
32
|
+
|
33
|
+
*Mike Dalessio*
|
34
|
+
|
35
|
+
* `LinkSanitizer` always returns UTF-8 encoded strings. `SafeListSanitizer` and `FullSanitizer`
|
36
|
+
already ensured this encoding.
|
37
|
+
|
38
|
+
*Mike Dalessio*
|
39
|
+
|
40
|
+
* `SafeListSanitizer` allows `time` tag and `lang` attribute by default.
|
41
|
+
|
42
|
+
*Mike Dalessio*
|
43
|
+
|
44
|
+
* The constant `Rails::Html::XPATHS_TO_REMOVE` has been removed. It's not necessary with the
|
45
|
+
existing sanitizers, and should have been a private constant all along anyway.
|
46
|
+
|
47
|
+
*Mike Dalessio*
|
48
|
+
|
49
|
+
|
50
|
+
## 1.5.0 / 2023-01-20
|
51
|
+
|
52
|
+
* `SafeListSanitizer`, `PermitScrubber`, and `TargetScrubber` now all support pruning of unsafe tags.
|
53
|
+
|
54
|
+
By default, unsafe tags are still stripped, but this behavior can be changed to prune the element
|
55
|
+
and its children from the document by passing `prune: true` to any of these classes' constructors.
|
56
|
+
|
57
|
+
*seyerian*
|
58
|
+
|
59
|
+
|
1
60
|
## 1.4.4 / 2022-12-13
|
2
61
|
|
3
62
|
* Address inefficient regular expression complexity with certain configurations of Rails::Html::Sanitizer.
|
@@ -52,6 +111,7 @@
|
|
52
111
|
|
53
112
|
*Mike Dalessio*
|
54
113
|
|
114
|
+
|
55
115
|
## 1.4.1 / 2021-08-18
|
56
116
|
|
57
117
|
* Fix regression in v1.4.0 that did not pass comment nodes to the scrubber.
|
@@ -64,6 +124,7 @@
|
|
64
124
|
|
65
125
|
*Mike Dalessio*
|
66
126
|
|
127
|
+
|
67
128
|
## 1.4.0 / 2021-08-18
|
68
129
|
|
69
130
|
* Processing Instructions are no longer allowed by Rails::Html::PermitScrubber
|
@@ -76,12 +137,14 @@
|
|
76
137
|
|
77
138
|
*Mike Dalessio*
|
78
139
|
|
140
|
+
|
79
141
|
## 1.3.0
|
80
142
|
|
81
143
|
* Address deprecations in Loofah 2.3.0.
|
82
144
|
|
83
145
|
*Josh Goodall*
|
84
146
|
|
147
|
+
|
85
148
|
## 1.2.0
|
86
149
|
|
87
150
|
* Remove needless `white_list_sanitizer` deprecation.
|
@@ -96,6 +159,7 @@
|
|
96
159
|
|
97
160
|
*Kasper Timm Hansen*
|
98
161
|
|
162
|
+
|
99
163
|
## 1.1.0
|
100
164
|
|
101
165
|
* Add `safe_list_sanitizer` and deprecate `white_list_sanitizer` to be removed
|
@@ -113,10 +177,12 @@
|
|
113
177
|
|
114
178
|
*Kasper Timm Hansen*
|
115
179
|
|
180
|
+
|
116
181
|
## 1.0.1
|
117
182
|
|
118
183
|
* Added support for Rails 4.2.0.beta2 and above
|
119
184
|
|
185
|
+
|
120
186
|
## 1.0.0
|
121
187
|
|
122
188
|
* First release.
|
data/MIT-LICENSE
CHANGED
data/README.md
CHANGED
@@ -1,61 +1,121 @@
|
|
1
|
-
# Rails
|
1
|
+
# Rails HTML Sanitizers
|
2
2
|
|
3
|
-
|
4
|
-
applications, i.e. in the `sanitize`, `sanitize_css`, `strip_tags` and `strip_links` methods.
|
3
|
+
This gem is responsible for sanitizing HTML fragments in Rails applications. Specifically, this is the set of sanitizers used to implement the Action View `SanitizerHelper` methods `sanitize`, `sanitize_css`, `strip_tags` and `strip_links`.
|
5
4
|
|
6
|
-
Rails
|
5
|
+
Rails HTML Sanitizer is only intended to be used with Rails applications. If you need similar functionality but aren't using Rails, consider using the underlying sanitization library [Loofah](https://github.com/flavorjones/loofah) directly.
|
7
6
|
|
8
|
-
## Installation
|
9
7
|
|
10
|
-
|
8
|
+
## Usage
|
11
9
|
|
12
|
-
|
10
|
+
### A note on HTML entities
|
13
11
|
|
14
|
-
|
12
|
+
__Rails HTML sanitizers are intended to be used by the view layer, at page-render time. They are *not* intended to sanitize persisted strings that will be sanitized *again* at page-render time.__
|
15
13
|
|
16
|
-
|
14
|
+
Proper HTML sanitization will replace some characters with HTML entities. For example, text containing a `<` character will be updated to contain `<` to ensure that the markup is well-formed.
|
17
15
|
|
18
|
-
|
16
|
+
This is important to keep in mind because __HTML entities will render improperly if they are sanitized twice.__
|
19
17
|
|
20
|
-
$ gem install rails-html-sanitizer
|
21
18
|
|
22
|
-
|
19
|
+
#### A concrete example showing the problem that can arise
|
20
|
+
|
21
|
+
Imagine the user is asked to enter their employer's name, which will appear on their public profile page. Then imagine they enter `JPMorgan Chase & Co.`.
|
22
|
+
|
23
|
+
If you sanitize this before persisting it in the database, the stored string will be `JPMorgan Chase & Co.`
|
24
|
+
|
25
|
+
When the page is rendered, if this string is sanitized a second time by the view layer, the HTML will contain `JPMorgan Chase &amp; Co.` which will render as "JPMorgan Chase &amp; Co.".
|
26
|
+
|
27
|
+
Another problem that can arise is rendering the sanitized string in a non-HTML context (for example, if it ends up being part of an SMS message). In this case, it may contain inappropriate HTML entities.
|
28
|
+
|
29
|
+
|
30
|
+
#### Suggested alternatives
|
31
|
+
|
32
|
+
You might simply choose to persist the untrusted string as-is (the raw input), and then ensure that the string will be properly sanitized by the view layer.
|
33
|
+
|
34
|
+
That raw string, if rendered in an non-HTML context (like SMS), must also be sanitized by a method appropriate for that context. You may wish to look into using [Loofah](https://github.com/flavorjones/loofah) or [Sanitize](https://github.com/rgrove/sanitize) to customize how this sanitization works, including omitting HTML entities in the final string.
|
35
|
+
|
36
|
+
If you really want to sanitize the string that's stored in your database, you may wish to look into [Loofah::ActiveRecord](https://github.com/flavorjones/loofah-activerecord) rather than use the Rails HTML sanitizers.
|
37
|
+
|
38
|
+
|
39
|
+
### A note on module names
|
40
|
+
|
41
|
+
In versions < 1.6, the only module defined by this library was `Rails::Html`. Starting in 1.6, we define three additional modules:
|
42
|
+
|
43
|
+
- `Rails::HTML` for general functionality (replacing `Rails::Html`)
|
44
|
+
- `Rails::HTML4` containing sanitizers that parse content as HTML4
|
45
|
+
- `Rails::HTML5` containing sanitizers that parse content as HTML5 (if supported)
|
46
|
+
|
47
|
+
The following aliases are maintained for backwards compatibility:
|
48
|
+
|
49
|
+
- `Rails::Html` points to `Rails::HTML`
|
50
|
+
- `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
|
51
|
+
- `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
|
52
|
+
- `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
|
53
|
+
|
23
54
|
|
24
55
|
### Sanitizers
|
25
56
|
|
26
|
-
All sanitizers respond to `sanitize
|
57
|
+
All sanitizers respond to `sanitize`, and are available in variants that use either HTML4 or HTML5 parsing, under the `Rails::HTML4` and `Rails::HTML5` namespaces, respectively.
|
58
|
+
|
59
|
+
NOTE: The HTML5 sanitizers are not supported on JRuby. Users may programmatically check for support by calling `Rails::HTML::Sanitizer.html5_support?`.
|
60
|
+
|
27
61
|
|
28
62
|
#### FullSanitizer
|
29
63
|
|
30
64
|
```ruby
|
31
|
-
full_sanitizer = Rails::
|
65
|
+
full_sanitizer = Rails::HTML5::FullSanitizer.new
|
32
66
|
full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
|
33
67
|
# => Bold no more! See more here...
|
34
68
|
```
|
35
69
|
|
70
|
+
or, if you insist on parsing the content as HTML4:
|
71
|
+
|
72
|
+
```ruby
|
73
|
+
full_sanitizer = Rails::HTML4::FullSanitizer.new
|
74
|
+
full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
|
75
|
+
# => Bold no more! See more here...
|
76
|
+
```
|
77
|
+
|
78
|
+
HTML5 version:
|
79
|
+
|
80
|
+
|
81
|
+
|
36
82
|
#### LinkSanitizer
|
37
83
|
|
38
84
|
```ruby
|
39
|
-
link_sanitizer = Rails::
|
85
|
+
link_sanitizer = Rails::HTML5::LinkSanitizer.new
|
40
86
|
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
|
41
87
|
# => Only the link text will be kept.
|
42
88
|
```
|
43
89
|
|
90
|
+
or, if you insist on parsing the content as HTML4:
|
91
|
+
|
92
|
+
```ruby
|
93
|
+
link_sanitizer = Rails::HTML4::LinkSanitizer.new
|
94
|
+
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
|
95
|
+
# => Only the link text will be kept.
|
96
|
+
```
|
97
|
+
|
98
|
+
|
44
99
|
#### SafeListSanitizer
|
45
100
|
|
101
|
+
This sanitizer is also available as an HTML4 variant, but for simplicity we'll document only the HTML5 variant below.
|
102
|
+
|
46
103
|
```ruby
|
47
|
-
safe_list_sanitizer = Rails::
|
104
|
+
safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new
|
48
105
|
|
49
106
|
# sanitize via an extensive safe list of allowed elements
|
50
107
|
safe_list_sanitizer.sanitize(@article.body)
|
51
108
|
|
52
|
-
#
|
109
|
+
# sanitize only the supplied tags and attributes
|
53
110
|
safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td), attributes: %w(id class style))
|
54
111
|
|
55
|
-
#
|
112
|
+
# sanitize via a custom scrubber
|
56
113
|
safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
|
57
114
|
|
58
|
-
#
|
115
|
+
# prune nodes from the tree instead of stripping tags and leaving inner content
|
116
|
+
safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)
|
117
|
+
|
118
|
+
# the sanitizer can also sanitize css
|
59
119
|
safe_list_sanitizer.sanitize_css('background-color: #000;')
|
60
120
|
```
|
61
121
|
|
@@ -63,14 +123,14 @@ safe_list_sanitizer.sanitize_css('background-color: #000;')
|
|
63
123
|
|
64
124
|
Scrubbers are objects responsible for removing nodes or attributes you don't want in your HTML document.
|
65
125
|
|
66
|
-
This gem includes two scrubbers `Rails::
|
126
|
+
This gem includes two scrubbers `Rails::HTML::PermitScrubber` and `Rails::HTML::TargetScrubber`.
|
67
127
|
|
68
|
-
#### `Rails::
|
128
|
+
#### `Rails::HTML::PermitScrubber`
|
69
129
|
|
70
130
|
This scrubber allows you to permit only the tags and attributes you want.
|
71
131
|
|
72
132
|
```ruby
|
73
|
-
scrubber = Rails::
|
133
|
+
scrubber = Rails::HTML::PermitScrubber.new
|
74
134
|
scrubber.tags = ['a']
|
75
135
|
|
76
136
|
html_fragment = Loofah.fragment('<a><img/ ></a>')
|
@@ -78,16 +138,34 @@ html_fragment.scrub!(scrubber)
|
|
78
138
|
html_fragment.to_s # => "<a></a>"
|
79
139
|
```
|
80
140
|
|
81
|
-
|
141
|
+
By default, inner content is left, but it can be removed as well.
|
142
|
+
|
143
|
+
```ruby
|
144
|
+
scrubber = Rails::HTML::PermitScrubber.new
|
145
|
+
scrubber.tags = ['a']
|
146
|
+
|
147
|
+
html_fragment = Loofah.fragment('<a><span>text</span></a>')
|
148
|
+
html_fragment.scrub!(scrubber)
|
149
|
+
html_fragment.to_s # => "<a>text</a>"
|
150
|
+
|
151
|
+
scrubber = Rails::HTML::PermitScrubber.new(prune: true)
|
152
|
+
scrubber.tags = ['a']
|
153
|
+
|
154
|
+
html_fragment = Loofah.fragment('<a><span>text</span></a>')
|
155
|
+
html_fragment.scrub!(scrubber)
|
156
|
+
html_fragment.to_s # => "<a></a>"
|
157
|
+
```
|
158
|
+
|
159
|
+
#### `Rails::HTML::TargetScrubber`
|
82
160
|
|
83
161
|
Where `PermitScrubber` picks out tags and attributes to permit in sanitization,
|
84
|
-
`Rails::
|
162
|
+
`Rails::HTML::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
|
85
163
|
|
86
164
|
**Note:** by default, it will scrub anything that is not part of the permitted tags from
|
87
165
|
loofah `HTML5::Scrub.allowed_element?`.
|
88
166
|
|
89
167
|
```ruby
|
90
|
-
scrubber = Rails::
|
168
|
+
scrubber = Rails::HTML::TargetScrubber.new
|
91
169
|
scrubber.tags = ['img']
|
92
170
|
|
93
171
|
html_fragment = Loofah.fragment('<a><img/ ></a>')
|
@@ -95,12 +173,30 @@ html_fragment.scrub!(scrubber)
|
|
95
173
|
html_fragment.to_s # => "<a></a>"
|
96
174
|
```
|
97
175
|
|
176
|
+
Similarly to `PermitScrubber`, nodes can be fully pruned.
|
177
|
+
|
178
|
+
```ruby
|
179
|
+
scrubber = Rails::HTML::TargetScrubber.new
|
180
|
+
scrubber.tags = ['span']
|
181
|
+
|
182
|
+
html_fragment = Loofah.fragment('<a><span>text</span></a>')
|
183
|
+
html_fragment.scrub!(scrubber)
|
184
|
+
html_fragment.to_s # => "<a>text</a>"
|
185
|
+
|
186
|
+
scrubber = Rails::HTML::TargetScrubber.new(prune: true)
|
187
|
+
scrubber.tags = ['span']
|
188
|
+
|
189
|
+
html_fragment = Loofah.fragment('<a><span>text</span></a>')
|
190
|
+
html_fragment.scrub!(scrubber)
|
191
|
+
html_fragment.to_s # => "<a></a>"
|
192
|
+
```
|
193
|
+
|
98
194
|
#### Custom Scrubbers
|
99
195
|
|
100
196
|
You can also create custom scrubbers in your application if you want to.
|
101
197
|
|
102
198
|
```ruby
|
103
|
-
class CommentScrubber < Rails::
|
199
|
+
class CommentScrubber < Rails::HTML::PermitScrubber
|
104
200
|
def initialize
|
105
201
|
super
|
106
202
|
self.tags = %w( form script comment blockquote )
|
@@ -113,7 +209,7 @@ class CommentScrubber < Rails::Html::PermitScrubber
|
|
113
209
|
end
|
114
210
|
```
|
115
211
|
|
116
|
-
See `Rails::
|
212
|
+
See `Rails::HTML::PermitScrubber` documentation to learn more about which methods can be overridden.
|
117
213
|
|
118
214
|
#### Custom Scrubber in a Rails app
|
119
215
|
|
@@ -123,20 +219,44 @@ Using the `CommentScrubber` from above, you can use this in a Rails view like so
|
|
123
219
|
<%= sanitize @comment, scrubber: CommentScrubber.new %>
|
124
220
|
```
|
125
221
|
|
222
|
+
## Installation
|
223
|
+
|
224
|
+
Add this line to your application's Gemfile:
|
225
|
+
|
226
|
+
gem 'rails-html-sanitizer'
|
227
|
+
|
228
|
+
And then execute:
|
229
|
+
|
230
|
+
$ bundle
|
231
|
+
|
232
|
+
Or install it yourself as:
|
233
|
+
|
234
|
+
$ gem install rails-html-sanitizer
|
235
|
+
|
236
|
+
|
126
237
|
## Read more
|
127
238
|
|
128
239
|
Loofah is what underlies the sanitizers and scrubbers of rails-html-sanitizer.
|
240
|
+
|
129
241
|
- [Loofah and Loofah Scrubbers](https://github.com/flavorjones/loofah)
|
130
242
|
|
131
243
|
The `node` argument passed to some methods in a custom scrubber is an instance of `Nokogiri::XML::Node`.
|
244
|
+
|
132
245
|
- [`Nokogiri::XML::Node`](https://nokogiri.org/rdoc/Nokogiri/XML/Node.html)
|
133
246
|
- [Nokogiri](http://nokogiri.org)
|
134
247
|
|
135
|
-
## Contributing to Rails Html Sanitizers
|
136
248
|
|
137
|
-
|
249
|
+
## Contributing to Rails HTML Sanitizers
|
250
|
+
|
251
|
+
Rails HTML Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
|
138
252
|
|
139
253
|
See [CONTRIBUTING](CONTRIBUTING.md).
|
140
254
|
|
255
|
+
### Security reports
|
256
|
+
|
257
|
+
Trying to report a possible security vulnerability in this project? Please check out the [Rails project's security policy](https://rubyonrails.org/security) for instructions.
|
258
|
+
|
259
|
+
|
141
260
|
## License
|
142
|
-
|
261
|
+
|
262
|
+
Rails HTML Sanitizers is released under the [MIT License](MIT-LICENSE).
|