html-pipeline 3.0.0.pre1 → 3.0.0.pre3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +38 -0
- data/Gemfile +3 -1
- data/README.md +19 -10
- data/UPGRADING.md +0 -1
- data/lib/html_pipeline/filter.rb +1 -1
- data/lib/html_pipeline/node_filter/syntax_highlight_filter.rb +62 -0
- data/lib/html_pipeline/node_filter/table_of_contents_filter.rb +4 -2
- data/lib/html_pipeline/sanitization_filter.rb +135 -12
- data/lib/html_pipeline/version.rb +1 -1
- data/lib/html_pipeline.rb +18 -8
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 613a099c0290d87932d2bbaf78052d8c472518bb911d82fe651ad78a3cc32ee9
|
4
|
+
data.tar.gz: d03826dc83afb689e87395d3a365a24fd5b3608e3a29b632a79a5928acb84a2b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 35ce289f0ca10929fba016a5dd0d7c0a6bf0d8f8d4390385590c0937190df67605b96b474a477a54feb5192a54fc5b299940d264483700a9c5deac04cff1b9e1
|
7
|
+
data.tar.gz: dbddce89afcb5f0291d28c9b24b4a2e4f25ea847d8960dc06a11f0c6cfd6cf2feb169f406fd69ca5f25a3eabc4338ce0cf7f2b94fa2147c682135bb6dee70728
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,43 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## [Unreleased](https://github.com/gjtorikian/html-pipeline/tree/HEAD)
|
4
|
+
|
5
|
+
[Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre3...HEAD)
|
6
|
+
|
7
|
+
**Closed issues:**
|
8
|
+
|
9
|
+
- v3: Question regarding requiring a ConvertFilter if there are NodeFilters [\#374](https://github.com/gjtorikian/html-pipeline/issues/374)
|
10
|
+
|
11
|
+
## [v3.0.0.pre3](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre3) (2023-02-15)
|
12
|
+
|
13
|
+
[Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre2...v3.0.0.pre3)
|
14
|
+
|
15
|
+
**Merged pull requests:**
|
16
|
+
|
17
|
+
- req convert\_filter if `text/node`filter present [\#375](https://github.com/gjtorikian/html-pipeline/pull/375) ([gjtorikian](https://github.com/gjtorikian))
|
18
|
+
|
19
|
+
## [v3.0.0.pre2](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre2) (2023-01-26)
|
20
|
+
|
21
|
+
[Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre1...v3.0.0.pre2)
|
22
|
+
|
23
|
+
**Closed issues:**
|
24
|
+
|
25
|
+
- Indicate a version for activesupport that has support/receives security patches \(\>= 6?\) [\#367](https://github.com/gjtorikian/html-pipeline/issues/367)
|
26
|
+
- Add MathML elements to whitelist [\#336](https://github.com/gjtorikian/html-pipeline/issues/336)
|
27
|
+
- Feature request: add safe semantic HTML tags to default whitelist [\#312](https://github.com/gjtorikian/html-pipeline/issues/312)
|
28
|
+
- Allow float: left|right and clear: left|right|both in sanitation [\#302](https://github.com/gjtorikian/html-pipeline/issues/302)
|
29
|
+
- Open link in new tab option [\#266](https://github.com/gjtorikian/html-pipeline/issues/266)
|
30
|
+
- Consider allowing LINK and META elements in HTML [\#261](https://github.com/gjtorikian/html-pipeline/issues/261)
|
31
|
+
- Allow SVG elements in whitelist [\#251](https://github.com/gjtorikian/html-pipeline/issues/251)
|
32
|
+
- Allow RDFa 1.1 \(Lite\) attributes [\#249](https://github.com/gjtorikian/html-pipeline/issues/249)
|
33
|
+
- What’s the point of allowing the accept-charset attribute in the sanitization filter ? [\#218](https://github.com/gjtorikian/html-pipeline/issues/218)
|
34
|
+
- Use gemojione instead of gemoji [\#200](https://github.com/gjtorikian/html-pipeline/issues/200)
|
35
|
+
- Link schema not used with \<a\> markup : print them directly [\#194](https://github.com/gjtorikian/html-pipeline/issues/194)
|
36
|
+
|
37
|
+
**Merged pull requests:**
|
38
|
+
|
39
|
+
- Use emoji from commonmarker [\#373](https://github.com/gjtorikian/html-pipeline/pull/373) ([gjtorikian](https://github.com/gjtorikian))
|
40
|
+
|
3
41
|
## [v3.0.0.pre1](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre1) (2022-12-30)
|
4
42
|
|
5
43
|
[Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v2.14.3...v3.0.0.pre1)
|
data/Gemfile
CHANGED
@@ -27,9 +27,10 @@ group :development do
|
|
27
27
|
end
|
28
28
|
|
29
29
|
group :test do
|
30
|
-
gem "commonmarker", "~> 1.0.0.
|
30
|
+
gem "commonmarker", "~> 1.0.0.pre7", require: false
|
31
31
|
gem "gemoji", "~> 3.0", require: false
|
32
32
|
gem "gemojione", "~> 4.3", require: false
|
33
|
+
|
33
34
|
gem "minitest"
|
34
35
|
|
35
36
|
gem "minitest-bisect", "~> 1.6"
|
@@ -37,4 +38,5 @@ group :test do
|
|
37
38
|
gem "nokogiri", "~> 1.13"
|
38
39
|
|
39
40
|
gem "minitest-focus", "~> 1.1"
|
41
|
+
gem "rouge", "~> 3.1", require: false
|
40
42
|
end
|
data/README.md
CHANGED
@@ -230,19 +230,28 @@ end
|
|
230
230
|
|
231
231
|
For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
|
232
232
|
|
233
|
-
- `AbsoluteSourceFilter
|
234
|
-
- `EmojiFilter
|
235
|
-
-
|
236
|
-
- `
|
237
|
-
- `
|
238
|
-
- `
|
239
|
-
- `
|
240
|
-
- `
|
233
|
+
- `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
|
234
|
+
- `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
|
235
|
+
- (Note: the included `MarkdownFilter` will already convert emoji)
|
236
|
+
- `HttpsFilter`: Replacing http urls with https versions
|
237
|
+
- `ImageMaxWidthFilter`: link to full size image for large images
|
238
|
+
- `MentionFilter`: replace `@user` mentions with links
|
239
|
+
- `SanitizationFilter`: allow sanitize user markup
|
240
|
+
- `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
|
241
|
+
- (Note: the included `MarkdownFilter` will already apply highlighting)
|
242
|
+
- `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
|
243
|
+
- `TeamMentionFilter`: replace `@org/team` mentions with links
|
241
244
|
|
242
245
|
## Dependencies
|
243
246
|
|
244
|
-
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
|
245
|
-
|
247
|
+
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
|
248
|
+
|
249
|
+
For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
|
250
|
+
to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
|
251
|
+
|
252
|
+
```ruby
|
253
|
+
gem "rouge"
|
254
|
+
```
|
246
255
|
|
247
256
|
> **Note**
|
248
257
|
> See the [Gemfile](/Gemfile) `:test` group for any version requirements.
|
data/UPGRADING.md
CHANGED
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
|
|
13
13
|
The following filters were removed:
|
14
14
|
|
15
15
|
- `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
|
16
|
-
- `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
|
17
16
|
- `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
|
18
17
|
|
19
18
|
- `EmailReplyFilter`
|
data/lib/html_pipeline/filter.rb
CHANGED
@@ -80,7 +80,7 @@ class HTMLPipeline
|
|
80
80
|
def needs(*keys)
|
81
81
|
missing = keys.reject { |key| context.include?(key) }
|
82
82
|
|
83
|
-
return
|
83
|
+
return if missing.none?
|
84
84
|
|
85
85
|
raise ArgumentError,
|
86
86
|
"Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join(", ")}"
|
@@ -0,0 +1,62 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
|
4
|
+
|
5
|
+
class HTMLPipeline
|
6
|
+
class NodeFilter
|
7
|
+
# HTML Filter that syntax highlights text inside code blocks.
|
8
|
+
#
|
9
|
+
# Context options:
|
10
|
+
#
|
11
|
+
# :highlight => String represents the language to pick lexer. Defaults to empty string.
|
12
|
+
# :scope => String represents the class attribute adds to pre element after.
|
13
|
+
# Defaults to "highlight highlight-css" if highlights a css code block.
|
14
|
+
#
|
15
|
+
# This filter does not write any additional information to the context hash.
|
16
|
+
class SyntaxHighlightFilter < NodeFilter
|
17
|
+
def initialize(context: {}, result: {})
|
18
|
+
super(context: context, result: result)
|
19
|
+
# TODO: test the optionality of this
|
20
|
+
@formatter = context[:formatter] || Rouge::Formatters::HTML.new
|
21
|
+
end
|
22
|
+
|
23
|
+
SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
|
24
|
+
|
25
|
+
def selector
|
26
|
+
SELECTOR
|
27
|
+
end
|
28
|
+
|
29
|
+
def handle_element(element)
|
30
|
+
default = context[:highlight]&.to_s
|
31
|
+
@lang = element["lang"] || default
|
32
|
+
|
33
|
+
scope = context.fetch(:scope, "highlight")
|
34
|
+
|
35
|
+
element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
|
36
|
+
end
|
37
|
+
|
38
|
+
def handle_text_chunk(text)
|
39
|
+
return if @lang.nil?
|
40
|
+
return if (lexer = lexer_for(@lang)).nil?
|
41
|
+
|
42
|
+
content = text.to_s
|
43
|
+
|
44
|
+
text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
|
45
|
+
end
|
46
|
+
|
47
|
+
def highlight_with_timeout_handling(text, lexer)
|
48
|
+
Rouge.highlight(text, lexer, @formatter)
|
49
|
+
rescue Timeout::Error => _e
|
50
|
+
text
|
51
|
+
end
|
52
|
+
|
53
|
+
def lexer_for(lang)
|
54
|
+
Rouge::Lexer.find(lang)
|
55
|
+
end
|
56
|
+
|
57
|
+
def include_lang?
|
58
|
+
!@lang.nil? && !@lang.empty?
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
62
|
+
end
|
@@ -24,8 +24,10 @@ class HTMLPipeline
|
|
24
24
|
# result[:output].to_s
|
25
25
|
# # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
|
26
26
|
class TableOfContentsFilter < NodeFilter
|
27
|
-
SELECTOR = Selma::Selector.new(
|
28
|
-
|
27
|
+
SELECTOR = Selma::Selector.new(
|
28
|
+
match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
|
29
|
+
match_text_within: "h1, h2, h3, h4, h5, h6",
|
30
|
+
)
|
29
31
|
|
30
32
|
def selector
|
31
33
|
SELECTOR
|
@@ -16,11 +16,70 @@ class HTMLPipeline
|
|
16
16
|
# The main sanitization allowlist. Only these elements and attributes are
|
17
17
|
# allowed through by default.
|
18
18
|
DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
|
19
|
-
elements: [
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
19
|
+
elements: [
|
20
|
+
"h1",
|
21
|
+
"h2",
|
22
|
+
"h3",
|
23
|
+
"h4",
|
24
|
+
"h5",
|
25
|
+
"h6",
|
26
|
+
"br",
|
27
|
+
"b",
|
28
|
+
"i",
|
29
|
+
"strong",
|
30
|
+
"em",
|
31
|
+
"a",
|
32
|
+
"pre",
|
33
|
+
"code",
|
34
|
+
"img",
|
35
|
+
"tt",
|
36
|
+
"div",
|
37
|
+
"ins",
|
38
|
+
"del",
|
39
|
+
"sup",
|
40
|
+
"sub",
|
41
|
+
"p",
|
42
|
+
"picture",
|
43
|
+
"ol",
|
44
|
+
"ul",
|
45
|
+
"table",
|
46
|
+
"thead",
|
47
|
+
"tbody",
|
48
|
+
"tfoot",
|
49
|
+
"blockquote",
|
50
|
+
"dl",
|
51
|
+
"dt",
|
52
|
+
"dd",
|
53
|
+
"kbd",
|
54
|
+
"q",
|
55
|
+
"samp",
|
56
|
+
"var",
|
57
|
+
"hr",
|
58
|
+
"ruby",
|
59
|
+
"rt",
|
60
|
+
"rp",
|
61
|
+
"li",
|
62
|
+
"tr",
|
63
|
+
"td",
|
64
|
+
"th",
|
65
|
+
"s",
|
66
|
+
"strike",
|
67
|
+
"summary",
|
68
|
+
"details",
|
69
|
+
"caption",
|
70
|
+
"figure",
|
71
|
+
"figcaption",
|
72
|
+
"abbr",
|
73
|
+
"bdo",
|
74
|
+
"cite",
|
75
|
+
"dfn",
|
76
|
+
"mark",
|
77
|
+
"small",
|
78
|
+
"source",
|
79
|
+
"span",
|
80
|
+
"time",
|
81
|
+
"wbr",
|
82
|
+
],
|
24
83
|
|
25
84
|
attributes: {
|
26
85
|
"a" => ["href"],
|
@@ -31,13 +90,77 @@ class HTMLPipeline
|
|
31
90
|
"ins" => ["cite"],
|
32
91
|
"q" => ["cite"],
|
33
92
|
"source" => ["srcset"],
|
34
|
-
all: [
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
93
|
+
all: [
|
94
|
+
"abbr",
|
95
|
+
"accept",
|
96
|
+
"accept-charset",
|
97
|
+
"accesskey",
|
98
|
+
"action",
|
99
|
+
"align",
|
100
|
+
"alt",
|
101
|
+
"aria-describedby",
|
102
|
+
"aria-hidden",
|
103
|
+
"aria-label",
|
104
|
+
"aria-labelledby",
|
105
|
+
"axis",
|
106
|
+
"border",
|
107
|
+
"char",
|
108
|
+
"charoff",
|
109
|
+
"charset",
|
110
|
+
"checked",
|
111
|
+
"clear",
|
112
|
+
"cols",
|
113
|
+
"colspan",
|
114
|
+
"compact",
|
115
|
+
"coords",
|
116
|
+
"datetime",
|
117
|
+
"dir",
|
118
|
+
"disabled",
|
119
|
+
"enctype",
|
120
|
+
"for",
|
121
|
+
"frame",
|
122
|
+
"headers",
|
123
|
+
"height",
|
124
|
+
"hreflang",
|
125
|
+
"hspace",
|
126
|
+
"id",
|
127
|
+
"ismap",
|
128
|
+
"label",
|
129
|
+
"lang",
|
130
|
+
"maxlength",
|
131
|
+
"media",
|
132
|
+
"method",
|
133
|
+
"multiple",
|
134
|
+
"name",
|
135
|
+
"nohref",
|
136
|
+
"noshade",
|
137
|
+
"nowrap",
|
138
|
+
"open",
|
139
|
+
"progress",
|
140
|
+
"prompt",
|
141
|
+
"readonly",
|
142
|
+
"rel",
|
143
|
+
"rev",
|
144
|
+
"role",
|
145
|
+
"rows",
|
146
|
+
"rowspan",
|
147
|
+
"rules",
|
148
|
+
"scope",
|
149
|
+
"selected",
|
150
|
+
"shape",
|
151
|
+
"size",
|
152
|
+
"span",
|
153
|
+
"start",
|
154
|
+
"summary",
|
155
|
+
"tabindex",
|
156
|
+
"title",
|
157
|
+
"type",
|
158
|
+
"usemap",
|
159
|
+
"valign",
|
160
|
+
"value",
|
161
|
+
"width",
|
162
|
+
"itemprop",
|
163
|
+
],
|
41
164
|
},
|
42
165
|
protocols: {
|
43
166
|
"a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
|
data/lib/html_pipeline.rb
CHANGED
@@ -116,8 +116,9 @@ end
|
|
116
116
|
validate_filters(@node_filters, HTMLPipeline::NodeFilter)
|
117
117
|
|
118
118
|
@convert_filter = convert_filter
|
119
|
-
|
120
|
-
|
119
|
+
|
120
|
+
if @convert_filter.nil? && (!@text_filters.empty? && !@node_filters.empty?)
|
121
|
+
raise InvalidFilterError, "Must provide `convert_filter` if `text_filters` and `node_filters` are also provided"
|
121
122
|
elsif !@convert_filter.nil?
|
122
123
|
validate_filter(@convert_filter, HTMLPipeline::ConvertFilter)
|
123
124
|
end
|
@@ -145,8 +146,11 @@ end
|
|
145
146
|
context = context.freeze
|
146
147
|
result ||= {}
|
147
148
|
|
148
|
-
payload = default_payload({
|
149
|
-
|
149
|
+
payload = default_payload({
|
150
|
+
text_filters: @text_filters.map(&:name),
|
151
|
+
context: context,
|
152
|
+
result: result,
|
153
|
+
})
|
150
154
|
instrument("call_text_filters.html_pipeline", payload) do
|
151
155
|
result[:output] =
|
152
156
|
@text_filters.inject(text) do |doc, filter|
|
@@ -159,8 +163,11 @@ end
|
|
159
163
|
html = @convert_filter.call(text) unless @convert_filter.nil?
|
160
164
|
|
161
165
|
unless @node_filters.empty?
|
162
|
-
payload = default_payload({
|
163
|
-
|
166
|
+
payload = default_payload({
|
167
|
+
node_filters: @node_filters.map { |f| f.class.name },
|
168
|
+
context: context,
|
169
|
+
result: result,
|
170
|
+
})
|
164
171
|
instrument("call_node_filters.html_pipeline", payload) do
|
165
172
|
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
|
166
173
|
end
|
@@ -178,8 +185,11 @@ end
|
|
178
185
|
#
|
179
186
|
# Returns the result of the filter.
|
180
187
|
def perform_filter(filter, doc, context: {}, result: {})
|
181
|
-
payload = default_payload({
|
182
|
-
|
188
|
+
payload = default_payload({
|
189
|
+
filter: filter.name,
|
190
|
+
context: context,
|
191
|
+
result: result,
|
192
|
+
})
|
183
193
|
instrument("call_filter.html_pipeline", payload) do
|
184
194
|
filter.call(doc, context: context, result: result)
|
185
195
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html-pipeline
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.0.0.
|
4
|
+
version: 3.0.0.pre3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Garen J. Torikian
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2023-02-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: selma
|
@@ -71,6 +71,7 @@ files:
|
|
71
71
|
- lib/html_pipeline/node_filter/https_filter.rb
|
72
72
|
- lib/html_pipeline/node_filter/image_max_width_filter.rb
|
73
73
|
- lib/html_pipeline/node_filter/mention_filter.rb
|
74
|
+
- lib/html_pipeline/node_filter/syntax_highlight_filter.rb
|
74
75
|
- lib/html_pipeline/node_filter/table_of_contents_filter.rb
|
75
76
|
- lib/html_pipeline/node_filter/team_mention_filter.rb
|
76
77
|
- lib/html_pipeline/sanitization_filter.rb
|