html-pipeline 3.0.0.pre1 → 3.0.0.pre2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +22 -0
- data/Gemfile +3 -1
- data/README.md +19 -10
- data/UPGRADING.md +0 -1
- data/lib/html_pipeline/node_filter/syntax_highlight_filter.rb +62 -0
- data/lib/html_pipeline/node_filter/table_of_contents_filter.rb +4 -2
- data/lib/html_pipeline/sanitization_filter.rb +135 -12
- data/lib/html_pipeline/version.rb +1 -1
- data/lib/html_pipeline.rb +15 -6
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 786f70829bb579116d522e399faeb85c5e47354255015a7ad4f302d062151c18
|
4
|
+
data.tar.gz: d8dfaeb90583a9644c9a1d31447122953ca34f6225e9720fb8d83cc7c7848faa
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0b04a86725107c550b252af0dee5b6647f7cce1db0ccd3349eae1b867e1b10910a256006457e8be4dab66e8e01dd7c891e7736b0c4e5294751fab45b0ed58a37
|
7
|
+
data.tar.gz: 468721f150845e39d69942259f02e0ff79e35730678c080ac9470ed92ae24e0b917bb14ba5bda82565e052f47057c0d1d3703069faaa9a0e319c939e27260751
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,27 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## [v3.0.0.pre2](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre2) (2023-01-26)
|
4
|
+
|
5
|
+
[Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre1...v3.0.0.pre2)
|
6
|
+
|
7
|
+
**Closed issues:**
|
8
|
+
|
9
|
+
- Indicate a version for activesupport that has support/receives security patches \(\>= 6?\) [\#367](https://github.com/gjtorikian/html-pipeline/issues/367)
|
10
|
+
- Add MathML elements to whitelist [\#336](https://github.com/gjtorikian/html-pipeline/issues/336)
|
11
|
+
- Feature request: add safe semantic HTML tags to default whitelist [\#312](https://github.com/gjtorikian/html-pipeline/issues/312)
|
12
|
+
- Allow float: left|right and clear: left|right|both in sanitation [\#302](https://github.com/gjtorikian/html-pipeline/issues/302)
|
13
|
+
- Open link in new tab option [\#266](https://github.com/gjtorikian/html-pipeline/issues/266)
|
14
|
+
- Consider allowing LINK and META elements in HTML [\#261](https://github.com/gjtorikian/html-pipeline/issues/261)
|
15
|
+
- Allow SVG elements in whitelist [\#251](https://github.com/gjtorikian/html-pipeline/issues/251)
|
16
|
+
- Allow RDFa 1.1 \(Lite\) attributes [\#249](https://github.com/gjtorikian/html-pipeline/issues/249)
|
17
|
+
- What’s the point of allowing the accept-charset attribute in the sanitization filter ? [\#218](https://github.com/gjtorikian/html-pipeline/issues/218)
|
18
|
+
- Use gemojione instead of gemoji [\#200](https://github.com/gjtorikian/html-pipeline/issues/200)
|
19
|
+
- Link schema not used with \<a\> markup : print them directly [\#194](https://github.com/gjtorikian/html-pipeline/issues/194)
|
20
|
+
|
21
|
+
**Merged pull requests:**
|
22
|
+
|
23
|
+
- Use emoji from commonmarker [\#373](https://github.com/gjtorikian/html-pipeline/pull/373) ([gjtorikian](https://github.com/gjtorikian))
|
24
|
+
|
3
25
|
## [v3.0.0.pre1](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre1) (2022-12-30)
|
4
26
|
|
5
27
|
[Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v2.14.3...v3.0.0.pre1)
|
data/Gemfile
CHANGED
@@ -27,9 +27,10 @@ group :development do
|
|
27
27
|
end
|
28
28
|
|
29
29
|
group :test do
|
30
|
-
gem "commonmarker", "~> 1.0.0.
|
30
|
+
gem "commonmarker", "~> 1.0.0.pre7", require: false
|
31
31
|
gem "gemoji", "~> 3.0", require: false
|
32
32
|
gem "gemojione", "~> 4.3", require: false
|
33
|
+
|
33
34
|
gem "minitest"
|
34
35
|
|
35
36
|
gem "minitest-bisect", "~> 1.6"
|
@@ -37,4 +38,5 @@ group :test do
|
|
37
38
|
gem "nokogiri", "~> 1.13"
|
38
39
|
|
39
40
|
gem "minitest-focus", "~> 1.1"
|
41
|
+
gem "rouge", "~> 3.1", require: false
|
40
42
|
end
|
data/README.md
CHANGED
@@ -230,19 +230,28 @@ end
|
|
230
230
|
|
231
231
|
For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
|
232
232
|
|
233
|
-
- `AbsoluteSourceFilter
|
234
|
-
- `EmojiFilter
|
235
|
-
-
|
236
|
-
- `
|
237
|
-
- `
|
238
|
-
- `
|
239
|
-
- `
|
240
|
-
- `
|
233
|
+
- `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
|
234
|
+
- `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
|
235
|
+
- (Note: the included `MarkdownFilter` will already convert emoji)
|
236
|
+
- `HttpsFilter`: Replacing http urls with https versions
|
237
|
+
- `ImageMaxWidthFilter`: link to full size image for large images
|
238
|
+
- `MentionFilter`: replace `@user` mentions with links
|
239
|
+
- `SanitizationFilter`: allow sanitize user markup
|
240
|
+
- `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
|
241
|
+
- (Note: the included `MarkdownFilter` will already apply highlighting)
|
242
|
+
- `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
|
243
|
+
- `TeamMentionFilter`: replace `@org/team` mentions with links
|
241
244
|
|
242
245
|
## Dependencies
|
243
246
|
|
244
|
-
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
|
245
|
-
|
247
|
+
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
|
248
|
+
|
249
|
+
For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
|
250
|
+
to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
|
251
|
+
|
252
|
+
```ruby
|
253
|
+
gem "rouge"
|
254
|
+
```
|
246
255
|
|
247
256
|
> **Note**
|
248
257
|
> See the [Gemfile](/Gemfile) `:test` group for any version requirements.
|
data/UPGRADING.md
CHANGED
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
|
|
13
13
|
The following filters were removed:
|
14
14
|
|
15
15
|
- `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
|
16
|
-
- `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
|
17
16
|
- `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
|
18
17
|
|
19
18
|
- `EmailReplyFilter`
|
@@ -0,0 +1,62 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
|
4
|
+
|
5
|
+
class HTMLPipeline
|
6
|
+
class NodeFilter
|
7
|
+
# HTML Filter that syntax highlights text inside code blocks.
|
8
|
+
#
|
9
|
+
# Context options:
|
10
|
+
#
|
11
|
+
# :highlight => String represents the language to pick lexer. Defaults to empty string.
|
12
|
+
# :scope => String represents the class attribute adds to pre element after.
|
13
|
+
# Defaults to "highlight highlight-css" if highlights a css code block.
|
14
|
+
#
|
15
|
+
# This filter does not write any additional information to the context hash.
|
16
|
+
class SyntaxHighlightFilter < NodeFilter
|
17
|
+
def initialize(context: {}, result: {})
|
18
|
+
super(context: context, result: result)
|
19
|
+
# TODO: test the optionality of this
|
20
|
+
@formatter = context[:formatter] || Rouge::Formatters::HTML.new
|
21
|
+
end
|
22
|
+
|
23
|
+
SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
|
24
|
+
|
25
|
+
def selector
|
26
|
+
SELECTOR
|
27
|
+
end
|
28
|
+
|
29
|
+
def handle_element(element)
|
30
|
+
default = context[:highlight]&.to_s
|
31
|
+
@lang = element["lang"] || default
|
32
|
+
|
33
|
+
scope = context.fetch(:scope, "highlight")
|
34
|
+
|
35
|
+
element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
|
36
|
+
end
|
37
|
+
|
38
|
+
def handle_text_chunk(text)
|
39
|
+
return if @lang.nil?
|
40
|
+
return if (lexer = lexer_for(@lang)).nil?
|
41
|
+
|
42
|
+
content = text.to_s
|
43
|
+
|
44
|
+
text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
|
45
|
+
end
|
46
|
+
|
47
|
+
def highlight_with_timeout_handling(text, lexer)
|
48
|
+
Rouge.highlight(text, lexer, @formatter)
|
49
|
+
rescue Timeout::Error => _e
|
50
|
+
text
|
51
|
+
end
|
52
|
+
|
53
|
+
def lexer_for(lang)
|
54
|
+
Rouge::Lexer.find(lang)
|
55
|
+
end
|
56
|
+
|
57
|
+
def include_lang?
|
58
|
+
!@lang.nil? && !@lang.empty?
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
@@ -24,8 +24,10 @@ class HTMLPipeline
|
|
24
24
|
# result[:output].to_s
|
25
25
|
# # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
|
26
26
|
class TableOfContentsFilter < NodeFilter
|
27
|
-
SELECTOR = Selma::Selector.new(
|
28
|
-
|
27
|
+
SELECTOR = Selma::Selector.new(
|
28
|
+
match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
|
29
|
+
match_text_within: "h1, h2, h3, h4, h5, h6",
|
30
|
+
)
|
29
31
|
|
30
32
|
def selector
|
31
33
|
SELECTOR
|
@@ -16,11 +16,70 @@ class HTMLPipeline
|
|
16
16
|
# The main sanitization allowlist. Only these elements and attributes are
|
17
17
|
# allowed through by default.
|
18
18
|
DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
|
19
|
-
elements: [
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
19
|
+
elements: [
|
20
|
+
"h1",
|
21
|
+
"h2",
|
22
|
+
"h3",
|
23
|
+
"h4",
|
24
|
+
"h5",
|
25
|
+
"h6",
|
26
|
+
"br",
|
27
|
+
"b",
|
28
|
+
"i",
|
29
|
+
"strong",
|
30
|
+
"em",
|
31
|
+
"a",
|
32
|
+
"pre",
|
33
|
+
"code",
|
34
|
+
"img",
|
35
|
+
"tt",
|
36
|
+
"div",
|
37
|
+
"ins",
|
38
|
+
"del",
|
39
|
+
"sup",
|
40
|
+
"sub",
|
41
|
+
"p",
|
42
|
+
"picture",
|
43
|
+
"ol",
|
44
|
+
"ul",
|
45
|
+
"table",
|
46
|
+
"thead",
|
47
|
+
"tbody",
|
48
|
+
"tfoot",
|
49
|
+
"blockquote",
|
50
|
+
"dl",
|
51
|
+
"dt",
|
52
|
+
"dd",
|
53
|
+
"kbd",
|
54
|
+
"q",
|
55
|
+
"samp",
|
56
|
+
"var",
|
57
|
+
"hr",
|
58
|
+
"ruby",
|
59
|
+
"rt",
|
60
|
+
"rp",
|
61
|
+
"li",
|
62
|
+
"tr",
|
63
|
+
"td",
|
64
|
+
"th",
|
65
|
+
"s",
|
66
|
+
"strike",
|
67
|
+
"summary",
|
68
|
+
"details",
|
69
|
+
"caption",
|
70
|
+
"figure",
|
71
|
+
"figcaption",
|
72
|
+
"abbr",
|
73
|
+
"bdo",
|
74
|
+
"cite",
|
75
|
+
"dfn",
|
76
|
+
"mark",
|
77
|
+
"small",
|
78
|
+
"source",
|
79
|
+
"span",
|
80
|
+
"time",
|
81
|
+
"wbr",
|
82
|
+
],
|
24
83
|
|
25
84
|
attributes: {
|
26
85
|
"a" => ["href"],
|
@@ -31,13 +90,77 @@ class HTMLPipeline
|
|
31
90
|
"ins" => ["cite"],
|
32
91
|
"q" => ["cite"],
|
33
92
|
"source" => ["srcset"],
|
34
|
-
all: [
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
93
|
+
all: [
|
94
|
+
"abbr",
|
95
|
+
"accept",
|
96
|
+
"accept-charset",
|
97
|
+
"accesskey",
|
98
|
+
"action",
|
99
|
+
"align",
|
100
|
+
"alt",
|
101
|
+
"aria-describedby",
|
102
|
+
"aria-hidden",
|
103
|
+
"aria-label",
|
104
|
+
"aria-labelledby",
|
105
|
+
"axis",
|
106
|
+
"border",
|
107
|
+
"char",
|
108
|
+
"charoff",
|
109
|
+
"charset",
|
110
|
+
"checked",
|
111
|
+
"clear",
|
112
|
+
"cols",
|
113
|
+
"colspan",
|
114
|
+
"compact",
|
115
|
+
"coords",
|
116
|
+
"datetime",
|
117
|
+
"dir",
|
118
|
+
"disabled",
|
119
|
+
"enctype",
|
120
|
+
"for",
|
121
|
+
"frame",
|
122
|
+
"headers",
|
123
|
+
"height",
|
124
|
+
"hreflang",
|
125
|
+
"hspace",
|
126
|
+
"id",
|
127
|
+
"ismap",
|
128
|
+
"label",
|
129
|
+
"lang",
|
130
|
+
"maxlength",
|
131
|
+
"media",
|
132
|
+
"method",
|
133
|
+
"multiple",
|
134
|
+
"name",
|
135
|
+
"nohref",
|
136
|
+
"noshade",
|
137
|
+
"nowrap",
|
138
|
+
"open",
|
139
|
+
"progress",
|
140
|
+
"prompt",
|
141
|
+
"readonly",
|
142
|
+
"rel",
|
143
|
+
"rev",
|
144
|
+
"role",
|
145
|
+
"rows",
|
146
|
+
"rowspan",
|
147
|
+
"rules",
|
148
|
+
"scope",
|
149
|
+
"selected",
|
150
|
+
"shape",
|
151
|
+
"size",
|
152
|
+
"span",
|
153
|
+
"start",
|
154
|
+
"summary",
|
155
|
+
"tabindex",
|
156
|
+
"title",
|
157
|
+
"type",
|
158
|
+
"usemap",
|
159
|
+
"valign",
|
160
|
+
"value",
|
161
|
+
"width",
|
162
|
+
"itemprop",
|
163
|
+
],
|
41
164
|
},
|
42
165
|
protocols: {
|
43
166
|
"a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
|
data/lib/html_pipeline.rb
CHANGED
@@ -145,8 +145,11 @@ end
|
|
145
145
|
context = context.freeze
|
146
146
|
result ||= {}
|
147
147
|
|
148
|
-
payload = default_payload({
|
149
|
-
|
148
|
+
payload = default_payload({
|
149
|
+
text_filters: @text_filters.map(&:name),
|
150
|
+
context: context,
|
151
|
+
result: result,
|
152
|
+
})
|
150
153
|
instrument("call_text_filters.html_pipeline", payload) do
|
151
154
|
result[:output] =
|
152
155
|
@text_filters.inject(text) do |doc, filter|
|
@@ -159,8 +162,11 @@ end
|
|
159
162
|
html = @convert_filter.call(text) unless @convert_filter.nil?
|
160
163
|
|
161
164
|
unless @node_filters.empty?
|
162
|
-
payload = default_payload({
|
163
|
-
|
165
|
+
payload = default_payload({
|
166
|
+
node_filters: @node_filters.map { |f| f.class.name },
|
167
|
+
context: context,
|
168
|
+
result: result,
|
169
|
+
})
|
164
170
|
instrument("call_node_filters.html_pipeline", payload) do
|
165
171
|
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
|
166
172
|
end
|
@@ -178,8 +184,11 @@ end
|
|
178
184
|
#
|
179
185
|
# Returns the result of the filter.
|
180
186
|
def perform_filter(filter, doc, context: {}, result: {})
|
181
|
-
payload = default_payload({
|
182
|
-
|
187
|
+
payload = default_payload({
|
188
|
+
filter: filter.name,
|
189
|
+
context: context,
|
190
|
+
result: result,
|
191
|
+
})
|
183
192
|
instrument("call_filter.html_pipeline", payload) do
|
184
193
|
filter.call(doc, context: context, result: result)
|
185
194
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html-pipeline
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.0.0.
|
4
|
+
version: 3.0.0.pre2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Garen J. Torikian
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2023-01-26 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: selma
|
@@ -71,6 +71,7 @@ files:
|
|
71
71
|
- lib/html_pipeline/node_filter/https_filter.rb
|
72
72
|
- lib/html_pipeline/node_filter/image_max_width_filter.rb
|
73
73
|
- lib/html_pipeline/node_filter/mention_filter.rb
|
74
|
+
- lib/html_pipeline/node_filter/syntax_highlight_filter.rb
|
74
75
|
- lib/html_pipeline/node_filter/table_of_contents_filter.rb
|
75
76
|
- lib/html_pipeline/node_filter/team_mention_filter.rb
|
76
77
|
- lib/html_pipeline/sanitization_filter.rb
|