html-pipeline 3.0.0.pre1 → 3.0.0.pre3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4b8171e0970d8cbe50757505138bbea0f812c06583cfcdfd192686a8aff9cb53
4
- data.tar.gz: 19a40ef93871f3d2b6411c526d6375a21d494a301d7a4abb2e92b6062de81b1c
3
+ metadata.gz: 613a099c0290d87932d2bbaf78052d8c472518bb911d82fe651ad78a3cc32ee9
4
+ data.tar.gz: d03826dc83afb689e87395d3a365a24fd5b3608e3a29b632a79a5928acb84a2b
5
5
  SHA512:
6
- metadata.gz: 4c160dcf8463cc5009d397529087c9f035cb3ae5e998c12ace03810fd8146b031f5f28829e318456239cd410d41fbf3d1da7d0b0e8097ceb3206733a96484c5f
7
- data.tar.gz: 72fa1ec6b3a2097dd0ce6108f55a10ac9b617e5fc607e67818876c5b8ececbc7c2bbd58e53ce530ba39f11f6ad0c292ec956e9e42fbed370213abc5b71bcf257
6
+ metadata.gz: 35ce289f0ca10929fba016a5dd0d7c0a6bf0d8f8d4390385590c0937190df67605b96b474a477a54feb5192a54fc5b299940d264483700a9c5deac04cff1b9e1
7
+ data.tar.gz: dbddce89afcb5f0291d28c9b24b4a2e4f25ea847d8960dc06a11f0c6cfd6cf2feb169f406fd69ca5f25a3eabc4338ce0cf7f2b94fa2147c682135bb6dee70728
data/CHANGELOG.md CHANGED
@@ -1,5 +1,43 @@
1
1
  # Changelog
2
2
 
3
+ ## [Unreleased](https://github.com/gjtorikian/html-pipeline/tree/HEAD)
4
+
5
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre3...HEAD)
6
+
7
+ **Closed issues:**
8
+
9
+ - v3: Question regarding requiring a ConvertFilter if there are NodeFilters [\#374](https://github.com/gjtorikian/html-pipeline/issues/374)
10
+
11
+ ## [v3.0.0.pre3](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre3) (2023-02-15)
12
+
13
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre2...v3.0.0.pre3)
14
+
15
+ **Merged pull requests:**
16
+
17
+ - req convert\_filter if `text/node`filter present [\#375](https://github.com/gjtorikian/html-pipeline/pull/375) ([gjtorikian](https://github.com/gjtorikian))
18
+
19
+ ## [v3.0.0.pre2](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre2) (2023-01-26)
20
+
21
+ [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v3.0.0.pre1...v3.0.0.pre2)
22
+
23
+ **Closed issues:**
24
+
25
+ - Indicate a version for activesupport that has support/receives security patches \(\>= 6?\) [\#367](https://github.com/gjtorikian/html-pipeline/issues/367)
26
+ - Add MathML elements to whitelist [\#336](https://github.com/gjtorikian/html-pipeline/issues/336)
27
+ - Feature request: add safe semantic HTML tags to default whitelist [\#312](https://github.com/gjtorikian/html-pipeline/issues/312)
28
+ - Allow float: left|right and clear: left|right|both in sanitation [\#302](https://github.com/gjtorikian/html-pipeline/issues/302)
29
+ - Open link in new tab option [\#266](https://github.com/gjtorikian/html-pipeline/issues/266)
30
+ - Consider allowing LINK and META elements in HTML [\#261](https://github.com/gjtorikian/html-pipeline/issues/261)
31
+ - Allow SVG elements in whitelist [\#251](https://github.com/gjtorikian/html-pipeline/issues/251)
32
+ - Allow RDFa 1.1 \(Lite\) attributes [\#249](https://github.com/gjtorikian/html-pipeline/issues/249)
33
+ - What’s the point of allowing the accept-charset attribute in the sanitization filter ? [\#218](https://github.com/gjtorikian/html-pipeline/issues/218)
34
+ - Use gemojione instead of gemoji [\#200](https://github.com/gjtorikian/html-pipeline/issues/200)
35
+ - Link schema not used with \<a\> markup : print them directly [\#194](https://github.com/gjtorikian/html-pipeline/issues/194)
36
+
37
+ **Merged pull requests:**
38
+
39
+ - Use emoji from commonmarker [\#373](https://github.com/gjtorikian/html-pipeline/pull/373) ([gjtorikian](https://github.com/gjtorikian))
40
+
3
41
  ## [v3.0.0.pre1](https://github.com/gjtorikian/html-pipeline/tree/v3.0.0.pre1) (2022-12-30)
4
42
 
5
43
  [Full Changelog](https://github.com/gjtorikian/html-pipeline/compare/v2.14.3...v3.0.0.pre1)
data/Gemfile CHANGED
@@ -27,9 +27,10 @@ group :development do
27
27
  end
28
28
 
29
29
  group :test do
30
- gem "commonmarker", "~> 1.0.0.pre4", require: false
30
+ gem "commonmarker", "~> 1.0.0.pre7", require: false
31
31
  gem "gemoji", "~> 3.0", require: false
32
32
  gem "gemojione", "~> 4.3", require: false
33
+
33
34
  gem "minitest"
34
35
 
35
36
  gem "minitest-bisect", "~> 1.6"
@@ -37,4 +38,5 @@ group :test do
37
38
  gem "nokogiri", "~> 1.13"
38
39
 
39
40
  gem "minitest-focus", "~> 1.1"
41
+ gem "rouge", "~> 3.1", require: false
40
42
  end
data/README.md CHANGED
@@ -230,19 +230,28 @@ end
230
230
 
231
231
  For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
232
232
 
233
- - `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
234
- - `EmojiFilter` - converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)!
235
- - `HttpsFilter` - Replacing http urls with https versions
236
- - `ImageMaxWidthFilter` - link to full size image for large images
237
- - `MentionFilter` - replace `@user` mentions with links
238
- - `SanitizationFilter` - allow sanitize user markup
239
- - `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
240
- - `TeamMentionFilter` - replace `@org/team` mentions with links
233
+ - `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
234
+ - `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
235
+ - (Note: the included `MarkdownFilter` will already convert emoji)
236
+ - `HttpsFilter`: Replacing http urls with https versions
237
+ - `ImageMaxWidthFilter`: link to full size image for large images
238
+ - `MentionFilter`: replace `@user` mentions with links
239
+ - `SanitizationFilter`: allow sanitize user markup
240
+ - `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
241
+ - (Note: the included `MarkdownFilter` will already apply highlighting)
242
+ - `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
243
+ - `TeamMentionFilter`: replace `@org/team` mentions with links
241
244
 
242
245
  ## Dependencies
243
246
 
244
- Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
245
- dependencies yourself.
247
+ Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
248
+
249
+ For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
250
+ to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
251
+
252
+ ```ruby
253
+ gem "rouge"
254
+ ```
246
255
 
247
256
  > **Note**
248
257
  > See the [Gemfile](/Gemfile) `:test` group for any version requirements.
data/UPGRADING.md CHANGED
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
13
13
  The following filters were removed:
14
14
 
15
15
  - `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
16
- - `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
17
16
  - `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
18
17
 
19
18
  - `EmailReplyFilter`
@@ -80,7 +80,7 @@ class HTMLPipeline
80
80
  def needs(*keys)
81
81
  missing = keys.reject { |key| context.include?(key) }
82
82
 
83
- return unless missing.any?
83
+ return if missing.none?
84
84
 
85
85
  raise ArgumentError,
86
86
  "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join(", ")}"
@@ -0,0 +1,62 @@
1
+ # frozen_string_literal: true
2
+
3
+ HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
4
+
5
+ class HTMLPipeline
6
+ class NodeFilter
7
+ # HTML Filter that syntax highlights text inside code blocks.
8
+ #
9
+ # Context options:
10
+ #
11
+ # :highlight => String represents the language to pick lexer. Defaults to empty string.
12
+ # :scope => String represents the class attribute adds to pre element after.
13
+ # Defaults to "highlight highlight-css" if highlights a css code block.
14
+ #
15
+ # This filter does not write any additional information to the context hash.
16
+ class SyntaxHighlightFilter < NodeFilter
17
+ def initialize(context: {}, result: {})
18
+ super(context: context, result: result)
19
+ # TODO: test the optionality of this
20
+ @formatter = context[:formatter] || Rouge::Formatters::HTML.new
21
+ end
22
+
23
+ SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
24
+
25
+ def selector
26
+ SELECTOR
27
+ end
28
+
29
+ def handle_element(element)
30
+ default = context[:highlight]&.to_s
31
+ @lang = element["lang"] || default
32
+
33
+ scope = context.fetch(:scope, "highlight")
34
+
35
+ element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
36
+ end
37
+
38
+ def handle_text_chunk(text)
39
+ return if @lang.nil?
40
+ return if (lexer = lexer_for(@lang)).nil?
41
+
42
+ content = text.to_s
43
+
44
+ text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
45
+ end
46
+
47
+ def highlight_with_timeout_handling(text, lexer)
48
+ Rouge.highlight(text, lexer, @formatter)
49
+ rescue Timeout::Error => _e
50
+ text
51
+ end
52
+
53
+ def lexer_for(lang)
54
+ Rouge::Lexer.find(lang)
55
+ end
56
+
57
+ def include_lang?
58
+ !@lang.nil? && !@lang.empty?
59
+ end
60
+ end
61
+ end
62
+ end
@@ -24,8 +24,10 @@ class HTMLPipeline
24
24
  # result[:output].to_s
25
25
  # # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
26
26
  class TableOfContentsFilter < NodeFilter
27
- SELECTOR = Selma::Selector.new(match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
28
- match_text_within: "h1, h2, h3, h4, h5, h6")
27
+ SELECTOR = Selma::Selector.new(
28
+ match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
29
+ match_text_within: "h1, h2, h3, h4, h5, h6",
30
+ )
29
31
 
30
32
  def selector
31
33
  SELECTOR
@@ -16,11 +16,70 @@ class HTMLPipeline
16
16
  # The main sanitization allowlist. Only these elements and attributes are
17
17
  # allowed through by default.
18
18
  DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
19
- elements: ["h1", "h2", "h3", "h4", "h5", "h6", "br", "b", "i", "strong", "em", "a", "pre", "code",
20
- "img", "tt", "div", "ins", "del", "sup", "sub", "p", "picture", "ol", "ul", "table", "thead", "tbody", "tfoot",
21
- "blockquote", "dl", "dt", "dd", "kbd", "q", "samp", "var", "hr", "ruby", "rt", "rp", "li", "tr", "td", "th",
22
- "s", "strike", "summary", "details", "caption", "figure", "figcaption", "abbr", "bdo", "cite",
23
- "dfn", "mark", "small", "source", "span", "time", "wbr",],
19
+ elements: [
20
+ "h1",
21
+ "h2",
22
+ "h3",
23
+ "h4",
24
+ "h5",
25
+ "h6",
26
+ "br",
27
+ "b",
28
+ "i",
29
+ "strong",
30
+ "em",
31
+ "a",
32
+ "pre",
33
+ "code",
34
+ "img",
35
+ "tt",
36
+ "div",
37
+ "ins",
38
+ "del",
39
+ "sup",
40
+ "sub",
41
+ "p",
42
+ "picture",
43
+ "ol",
44
+ "ul",
45
+ "table",
46
+ "thead",
47
+ "tbody",
48
+ "tfoot",
49
+ "blockquote",
50
+ "dl",
51
+ "dt",
52
+ "dd",
53
+ "kbd",
54
+ "q",
55
+ "samp",
56
+ "var",
57
+ "hr",
58
+ "ruby",
59
+ "rt",
60
+ "rp",
61
+ "li",
62
+ "tr",
63
+ "td",
64
+ "th",
65
+ "s",
66
+ "strike",
67
+ "summary",
68
+ "details",
69
+ "caption",
70
+ "figure",
71
+ "figcaption",
72
+ "abbr",
73
+ "bdo",
74
+ "cite",
75
+ "dfn",
76
+ "mark",
77
+ "small",
78
+ "source",
79
+ "span",
80
+ "time",
81
+ "wbr",
82
+ ],
24
83
 
25
84
  attributes: {
26
85
  "a" => ["href"],
@@ -31,13 +90,77 @@ class HTMLPipeline
31
90
  "ins" => ["cite"],
32
91
  "q" => ["cite"],
33
92
  "source" => ["srcset"],
34
- all: ["abbr", "accept", "accept-charset", "accesskey", "action", "align", "alt", "aria-describedby",
35
- "aria-hidden", "aria-label", "aria-labelledby", "axis", "border", "char",
36
- "charoff", "charset", "checked", "clear", "cols", "colspan", "compact", "coords", "datetime", "dir",
37
- "disabled", "enctype", "for", "frame", "headers", "height", "hreflang", "hspace", "id", "ismap", "label", "lang",
38
- "maxlength", "media", "method", "multiple", "name", "nohref", "noshade", "nowrap", "open", "progress",
39
- "prompt", "readonly", "rel", "rev", "role", "rows", "rowspan", "rules", "scope", "selected", "shape",
40
- "size", "span", "start", "summary", "tabindex", "title", "type", "usemap", "valign", "value", "width", "itemprop",],
93
+ all: [
94
+ "abbr",
95
+ "accept",
96
+ "accept-charset",
97
+ "accesskey",
98
+ "action",
99
+ "align",
100
+ "alt",
101
+ "aria-describedby",
102
+ "aria-hidden",
103
+ "aria-label",
104
+ "aria-labelledby",
105
+ "axis",
106
+ "border",
107
+ "char",
108
+ "charoff",
109
+ "charset",
110
+ "checked",
111
+ "clear",
112
+ "cols",
113
+ "colspan",
114
+ "compact",
115
+ "coords",
116
+ "datetime",
117
+ "dir",
118
+ "disabled",
119
+ "enctype",
120
+ "for",
121
+ "frame",
122
+ "headers",
123
+ "height",
124
+ "hreflang",
125
+ "hspace",
126
+ "id",
127
+ "ismap",
128
+ "label",
129
+ "lang",
130
+ "maxlength",
131
+ "media",
132
+ "method",
133
+ "multiple",
134
+ "name",
135
+ "nohref",
136
+ "noshade",
137
+ "nowrap",
138
+ "open",
139
+ "progress",
140
+ "prompt",
141
+ "readonly",
142
+ "rel",
143
+ "rev",
144
+ "role",
145
+ "rows",
146
+ "rowspan",
147
+ "rules",
148
+ "scope",
149
+ "selected",
150
+ "shape",
151
+ "size",
152
+ "span",
153
+ "start",
154
+ "summary",
155
+ "tabindex",
156
+ "title",
157
+ "type",
158
+ "usemap",
159
+ "valign",
160
+ "value",
161
+ "width",
162
+ "itemprop",
163
+ ],
41
164
  },
42
165
  protocols: {
43
166
  "a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class HTMLPipeline
4
- VERSION = "3.0.0.pre1"
4
+ VERSION = "3.0.0.pre3"
5
5
  end
data/lib/html_pipeline.rb CHANGED
@@ -116,8 +116,9 @@ end
116
116
  validate_filters(@node_filters, HTMLPipeline::NodeFilter)
117
117
 
118
118
  @convert_filter = convert_filter
119
- if @convert_filter.nil? && !@node_filters.empty?
120
- raise InvalidFilterError, "Must provide `convert_filter` if `node_filter`s is also provided"
119
+
120
+ if @convert_filter.nil? && (!@text_filters.empty? && !@node_filters.empty?)
121
+ raise InvalidFilterError, "Must provide `convert_filter` if `text_filters` and `node_filters` are also provided"
121
122
  elsif !@convert_filter.nil?
122
123
  validate_filter(@convert_filter, HTMLPipeline::ConvertFilter)
123
124
  end
@@ -145,8 +146,11 @@ end
145
146
  context = context.freeze
146
147
  result ||= {}
147
148
 
148
- payload = default_payload({ text_filters: @text_filters.map(&:name),
149
- context: context, result: result, })
149
+ payload = default_payload({
150
+ text_filters: @text_filters.map(&:name),
151
+ context: context,
152
+ result: result,
153
+ })
150
154
  instrument("call_text_filters.html_pipeline", payload) do
151
155
  result[:output] =
152
156
  @text_filters.inject(text) do |doc, filter|
@@ -159,8 +163,11 @@ end
159
163
  html = @convert_filter.call(text) unless @convert_filter.nil?
160
164
 
161
165
  unless @node_filters.empty?
162
- payload = default_payload({ node_filters: @node_filters.map { |f| f.class.name },
163
- context: context, result: result, })
166
+ payload = default_payload({
167
+ node_filters: @node_filters.map { |f| f.class.name },
168
+ context: context,
169
+ result: result,
170
+ })
164
171
  instrument("call_node_filters.html_pipeline", payload) do
165
172
  result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
166
173
  end
@@ -178,8 +185,11 @@ end
178
185
  #
179
186
  # Returns the result of the filter.
180
187
  def perform_filter(filter, doc, context: {}, result: {})
181
- payload = default_payload({ filter: filter.name,
182
- context: context, result: result, })
188
+ payload = default_payload({
189
+ filter: filter.name,
190
+ context: context,
191
+ result: result,
192
+ })
183
193
  instrument("call_filter.html_pipeline", payload) do
184
194
  filter.call(doc, context: context, result: result)
185
195
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-pipeline
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.pre1
4
+ version: 3.0.0.pre3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Garen J. Torikian
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-12-30 00:00:00.000000000 Z
11
+ date: 2023-02-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: selma
@@ -71,6 +71,7 @@ files:
71
71
  - lib/html_pipeline/node_filter/https_filter.rb
72
72
  - lib/html_pipeline/node_filter/image_max_width_filter.rb
73
73
  - lib/html_pipeline/node_filter/mention_filter.rb
74
+ - lib/html_pipeline/node_filter/syntax_highlight_filter.rb
74
75
  - lib/html_pipeline/node_filter/table_of_contents_filter.rb
75
76
  - lib/html_pipeline/node_filter/team_mention_filter.rb
76
77
  - lib/html_pipeline/sanitization_filter.rb