html-pipeline 0.0.4 → 0.0.5
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.md +12 -0
- data/README.md +100 -7
- data/html-pipeline.gemspec +10 -9
- data/lib/html/pipeline.rb +3 -3
- data/lib/html/pipeline/camo_filter.rb +10 -4
- data/lib/html/pipeline/emoji_filter.rb +8 -2
- data/lib/html/pipeline/filter.rb +21 -1
- data/lib/html/pipeline/image_max_width_filter.rb +2 -2
- data/lib/html/pipeline/sanitization_filter.rb +6 -8
- data/lib/html/pipeline/version.rb +1 -1
- data/test/html/pipeline/camo_filter_test.rb +8 -0
- data/test/html/pipeline/emoji_filter_test.rb +9 -7
- data/test/html/pipeline/sanitization_filter_test.rb +1 -1
- metadata +7 -5
data/CHANGELOG.md
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
# CHANGELOG
|
2
|
+
|
3
|
+
## 0.0.5 (unreleased)
|
4
|
+
|
5
|
+
* fix li xss vulnerability in sanitization filter: vmg #31
|
6
|
+
* gemspec cleanup: nbibler #23, jbarnette #24
|
7
|
+
* doc updates: jch #16, pborreli #17, wickedshimmy #18, benubois #19, blackerby #21
|
8
|
+
* loosen gemoji dependency: josh #15
|
9
|
+
|
10
|
+
## 0.0.4
|
11
|
+
|
12
|
+
* initial public release
|
data/README.md
CHANGED
@@ -53,7 +53,9 @@ pipeline = HTML::Pipeline.new [
|
|
53
53
|
result = pipeline.call <<CODE
|
54
54
|
This is *great*:
|
55
55
|
|
56
|
-
|
56
|
+
``` ruby
|
57
|
+
some_code(:first)
|
58
|
+
```
|
57
59
|
|
58
60
|
CODE
|
59
61
|
result[:output].to_s
|
@@ -83,19 +85,108 @@ filter.call
|
|
83
85
|
## Filters
|
84
86
|
|
85
87
|
* `MentionFilter` - replace `@user` mentions with links
|
86
|
-
* `
|
87
|
-
* `CamoFilter` - replace http image urls with [camo-fied](https://github.com/
|
88
|
+
* `AutolinkFilter` - auto_linking urls in HTML
|
89
|
+
* `CamoFilter` - replace http image urls with [camo-fied](https://github.com/atmos/camo) https versions
|
88
90
|
* `EmailReplyFilter` - util filter for working with emails
|
89
91
|
* `EmojiFilter` - everyone loves [emoji](http://www.emoji-cheat-sheet.com/)!
|
92
|
+
* `HttpsFilter` - HTML Filter for replacing http github urls with https versions.
|
90
93
|
* `ImageMaxWidthFilter` - link to full size image for large images
|
91
94
|
* `MarkdownFilter` - convert markdown to html
|
92
95
|
* `PlainTextInputFilter` - html escape text and wrap the result in a div
|
93
|
-
* `SanitizationFilter` - whitelist
|
96
|
+
* `SanitizationFilter` - whitelist sanitize user markup
|
94
97
|
* `SyntaxHighlightFilter` - code syntax highlighter with [linguist](https://github.com/github/linguist)
|
95
98
|
* `TextileFilter` - convert textile to html
|
96
99
|
* `TableOfContentsFilter` - anchor headings with name attributes
|
97
100
|
|
98
|
-
##
|
101
|
+
## Examples
|
102
|
+
|
103
|
+
We define different pipelines for different parts of our app. Here are a few
|
104
|
+
paraphrased snippets to get you started:
|
105
|
+
|
106
|
+
```ruby
|
107
|
+
# The context hash is how you pass options between different filters.
|
108
|
+
# See individual filter source for explanation of options.
|
109
|
+
context = {
|
110
|
+
:asset_root => "http://your-domain.com/where/your/images/live/icons",
|
111
|
+
:base_url => "http://your-domain.com"
|
112
|
+
}
|
113
|
+
|
114
|
+
# Pipeline providing sanitization and image hijacking but no mention
|
115
|
+
# related features.
|
116
|
+
SimplePipeline = Pipeline.new [
|
117
|
+
SanitizationFilter,
|
118
|
+
TableOfContentsFilter, # add 'name' anchors to all headers
|
119
|
+
CamoFilter,
|
120
|
+
ImageMaxWidthFilter,
|
121
|
+
SyntaxHighlightFilter,
|
122
|
+
EmojiFilter,
|
123
|
+
AutolinkFilter
|
124
|
+
], context, {}
|
125
|
+
|
126
|
+
# Pipeline used for user provided content on the web
|
127
|
+
MarkdownPipeline = Pipeline.new [
|
128
|
+
MarkdownFilter,
|
129
|
+
SanitizationFilter,
|
130
|
+
CamoFilter,
|
131
|
+
ImageMaxWidthFilter,
|
132
|
+
HttpsFilter,
|
133
|
+
MentionFilter,
|
134
|
+
EmojiFilter,
|
135
|
+
SyntaxHighlightFilter
|
136
|
+
], context.merge(:gfm => true), {} # enable github formatted markdown
|
137
|
+
|
138
|
+
|
139
|
+
# Define a pipeline based on another pipeline's filters
|
140
|
+
NonGFMMarkdownPipeline = Pipeline.new(MarkdownPipeline.filters,
|
141
|
+
context.merge(:gfm => false), {})
|
142
|
+
|
143
|
+
# Pipelines aren't limited to the web. You can use them for email
|
144
|
+
# processing also.
|
145
|
+
HtmlEmailPipeline = Pipeline.new [
|
146
|
+
ImageMaxWidthFilter
|
147
|
+
], {}, {}
|
148
|
+
|
149
|
+
# Just emoji.
|
150
|
+
EmojiPipeline = Pipeline.new [
|
151
|
+
HTMLInputFilter,
|
152
|
+
EmojiFilter
|
153
|
+
], context, {}
|
154
|
+
```
|
155
|
+
|
156
|
+
## Extending
|
157
|
+
To write a custom filter, you need a class with a `call` method that inherits
|
158
|
+
from `HTML::Pipeline::Filter`.
|
159
|
+
|
160
|
+
For example this filter adds a base url to images that are root relative:
|
161
|
+
|
162
|
+
```ruby
|
163
|
+
require 'uri'
|
164
|
+
|
165
|
+
class RootRelativeFilter < HTML::Pipeline::Filter
|
166
|
+
|
167
|
+
def call
|
168
|
+
doc.search("img").each do |img|
|
169
|
+
next if img['src'].nil?
|
170
|
+
src = img['src'].strip
|
171
|
+
if src.start_with? '/'
|
172
|
+
img["src"] = URI.join(context[:base_url], src).to_s
|
173
|
+
end
|
174
|
+
end
|
175
|
+
doc
|
176
|
+
end
|
177
|
+
|
178
|
+
end
|
179
|
+
```
|
180
|
+
|
181
|
+
Now this filter can be used in a pipeline:
|
182
|
+
|
183
|
+
```ruby
|
184
|
+
Pipeline.new [ RootRelativeFilter ], { :base_url => 'http://somehost.com' }
|
185
|
+
```
|
186
|
+
|
187
|
+
## Development
|
188
|
+
|
189
|
+
To see what has changed in recent versions, see the [CHANGELOG](https://github.com/jch/html-pipeline/blob/master/CHANGELOG.md).
|
99
190
|
|
100
191
|
```sh
|
101
192
|
bundle
|
@@ -104,11 +195,11 @@ rake test
|
|
104
195
|
|
105
196
|
## Contributing
|
106
197
|
|
107
|
-
1. Fork it
|
198
|
+
1. [Fork it](https://help.github.com/articles/fork-a-repo)
|
108
199
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
109
200
|
3. Commit your changes (`git commit -am 'Added some feature'`)
|
110
201
|
4. Push to the branch (`git push origin my-new-feature`)
|
111
|
-
5. Create new Pull Request
|
202
|
+
5. Create new [Pull Request](https://help.github.com/articles/using-pull-requests)
|
112
203
|
|
113
204
|
|
114
205
|
## TODO
|
@@ -126,3 +217,5 @@ rake test
|
|
126
217
|
* [Simon Rozet](mailto:simon@rozet.name)
|
127
218
|
* [Vicent Martí](mailto:tanoku@gmail.com)
|
128
219
|
* [Risk :danger: Olson](mailto:technoweenie@gmail.com)
|
220
|
+
|
221
|
+
Project is a member of the [OSS Manifesto](http://ossmanifesto.org/).
|
data/html-pipeline.gemspec
CHANGED
@@ -1,9 +1,10 @@
|
|
1
1
|
# -*- encoding: utf-8 -*-
|
2
|
-
require File.expand_path(
|
2
|
+
require File.expand_path("../lib/html/pipeline/version", __FILE__)
|
3
3
|
|
4
4
|
Gem::Specification.new do |gem|
|
5
5
|
gem.name = "html-pipeline"
|
6
6
|
gem.version = HTML::Pipeline::VERSION
|
7
|
+
gem.license = "MIT"
|
7
8
|
gem.authors = ["Ryan Tomayko", "Jerry Cheung"]
|
8
9
|
gem.email = ["ryan@github.com", "jerry@github.com"]
|
9
10
|
gem.description = %q{GitHub HTML processing filters and utilities}
|
@@ -14,12 +15,12 @@ Gem::Specification.new do |gem|
|
|
14
15
|
gem.test_files = gem.files.grep(%r{^test})
|
15
16
|
gem.require_paths = ["lib"]
|
16
17
|
|
17
|
-
gem.add_dependency
|
18
|
-
gem.add_dependency
|
19
|
-
gem.add_dependency
|
20
|
-
gem.add_dependency
|
21
|
-
gem.add_dependency
|
22
|
-
gem.add_dependency
|
23
|
-
gem.add_dependency
|
24
|
-
gem.add_dependency
|
18
|
+
gem.add_dependency "gemoji", "~> 1.0"
|
19
|
+
gem.add_dependency "nokogiri", "~> 1.4"
|
20
|
+
gem.add_dependency "github-markdown", "~> 0.5"
|
21
|
+
gem.add_dependency "sanitize", "~> 2.0"
|
22
|
+
gem.add_dependency "github-linguist", "~> 2.1"
|
23
|
+
gem.add_dependency "rinku", "~> 1.7"
|
24
|
+
gem.add_dependency "escape_utils", "~> 0.2"
|
25
|
+
gem.add_dependency "activesupport", ">= 2"
|
25
26
|
end
|
data/lib/html/pipeline.rb
CHANGED
@@ -9,8 +9,8 @@ module HTML
|
|
9
9
|
#
|
10
10
|
# See HTML::Pipeline::Filter for information on building filters.
|
11
11
|
#
|
12
|
-
#
|
13
|
-
# with one to many filters, and
|
12
|
+
# Construct a Pipeline for running multiple HTML filters. A pipeline is created once
|
13
|
+
# with one to many filters, and it then can be `call`ed many times over the course
|
14
14
|
# of its lifetime with input.
|
15
15
|
#
|
16
16
|
# filters - Array of Filter objects. Each must respond to call(doc,
|
@@ -22,7 +22,7 @@ module HTML
|
|
22
22
|
# nil. Default: empty Hash.
|
23
23
|
# result_class - The default Class of the result object for individual
|
24
24
|
# calls. Default: Hash. Protip: Pass in a Struct to get
|
25
|
-
# some
|
25
|
+
# some semblance of type safety.
|
26
26
|
class Pipeline
|
27
27
|
autoload :VERSION, 'html/pipeline/version'
|
28
28
|
autoload :Pipeline, 'html/pipeline/pipeline'
|
@@ -11,8 +11,8 @@ module HTML
|
|
11
11
|
# in browser clients.
|
12
12
|
#
|
13
13
|
# Context options:
|
14
|
-
# :asset_proxy - Base URL for constructed asset proxy URLs.
|
15
|
-
# :asset_proxy_secret_key - The shared secret used to encode URLs.
|
14
|
+
# :asset_proxy (required) - Base URL for constructed asset proxy URLs.
|
15
|
+
# :asset_proxy_secret_key (required) - The shared secret used to encode URLs.
|
16
16
|
#
|
17
17
|
# This filter does not write additional information to the context.
|
18
18
|
class CamoFilter < Filter
|
@@ -33,6 +33,12 @@ module HTML
|
|
33
33
|
end
|
34
34
|
doc
|
35
35
|
end
|
36
|
+
|
37
|
+
# Implementation of validate hook.
|
38
|
+
# Errors should raise exceptions or use an existing validator.
|
39
|
+
def validate
|
40
|
+
needs :asset_proxy, :asset_proxy_secret_key
|
41
|
+
end
|
36
42
|
|
37
43
|
# The camouflaged URL for a given image URL.
|
38
44
|
def asset_proxy_url(url)
|
@@ -47,11 +53,11 @@ module HTML
|
|
47
53
|
|
48
54
|
# Private: the hostname to use for generated asset proxied URLs.
|
49
55
|
def asset_proxy_host
|
50
|
-
context[:asset_proxy]
|
56
|
+
context[:asset_proxy]
|
51
57
|
end
|
52
58
|
|
53
59
|
def asset_proxy_secret_key
|
54
|
-
context[:asset_proxy_secret_key]
|
60
|
+
context[:asset_proxy_secret_key]
|
55
61
|
end
|
56
62
|
|
57
63
|
# Private: helper to hexencode a string. Each byte ends up encoded into
|
@@ -5,7 +5,7 @@ module HTML
|
|
5
5
|
# HTML filter that replaces :emoji: with images.
|
6
6
|
#
|
7
7
|
# Context:
|
8
|
-
# :asset_root - base url to link to emoji sprite
|
8
|
+
# :asset_root (required) - base url to link to emoji sprite
|
9
9
|
class EmojiFilter < Filter
|
10
10
|
# Build a regexp that matches all valid :emoji: names.
|
11
11
|
EmojiPattern = /:(#{Emoji.names.map { |name| Regexp.escape(name) }.join('|')}):/
|
@@ -21,6 +21,12 @@ module HTML
|
|
21
21
|
end
|
22
22
|
doc
|
23
23
|
end
|
24
|
+
|
25
|
+
# Implementation of validate hook.
|
26
|
+
# Errors should raise exceptions or use an existing validator.
|
27
|
+
def validate
|
28
|
+
needs :asset_root
|
29
|
+
end
|
24
30
|
|
25
31
|
# Replace :emoji: with corresponding images.
|
26
32
|
#
|
@@ -41,7 +47,7 @@ module HTML
|
|
41
47
|
# Raises ArgumentError if context option has not been provided.
|
42
48
|
# Returns the context's asset_root.
|
43
49
|
def asset_root
|
44
|
-
context[:asset_root]
|
50
|
+
context[:asset_root]
|
45
51
|
end
|
46
52
|
end
|
47
53
|
end
|
data/lib/html/pipeline/filter.rb
CHANGED
@@ -39,8 +39,9 @@ module HTML
|
|
39
39
|
end
|
40
40
|
@context = context || {}
|
41
41
|
@result = result || {}
|
42
|
+
validate
|
42
43
|
end
|
43
|
-
|
44
|
+
|
44
45
|
# Public: Returns a simple Hash used to pass extra information into filters
|
45
46
|
# and also to allow filters to make extracted information available to the
|
46
47
|
# caller.
|
@@ -73,6 +74,10 @@ module HTML
|
|
73
74
|
def call
|
74
75
|
raise NotImplementedError
|
75
76
|
end
|
77
|
+
|
78
|
+
# Make sure the context has everything we need. Noop: Subclasses can override.
|
79
|
+
def validate
|
80
|
+
end
|
76
81
|
|
77
82
|
# The Repository object provided in the context hash, or nil when no
|
78
83
|
# :repository was specified.
|
@@ -153,6 +158,21 @@ module HTML
|
|
153
158
|
output.to_s
|
154
159
|
end
|
155
160
|
end
|
161
|
+
|
162
|
+
# Validator for required context. This will check that anything passed in
|
163
|
+
# contexts exists in @contexts
|
164
|
+
#
|
165
|
+
# If any errors are found an ArgumentError will be raised with a
|
166
|
+
# message listing all the missing contexts and the filters that
|
167
|
+
# require them.
|
168
|
+
def needs(*keys)
|
169
|
+
missing = keys.reject { |key| context.include? key }
|
170
|
+
|
171
|
+
if missing.any?
|
172
|
+
raise ArgumentError,
|
173
|
+
"Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
|
174
|
+
end
|
175
|
+
end
|
156
176
|
end
|
157
177
|
end
|
158
178
|
end
|
@@ -9,11 +9,11 @@ module HTML
|
|
9
9
|
class ImageMaxWidthFilter < Filter
|
10
10
|
def call
|
11
11
|
doc.search('img').each do |element|
|
12
|
-
# Skip if
|
12
|
+
# Skip if there's already a style attribute. Not sure how this
|
13
13
|
# would happen but we can reconsider it in the future.
|
14
14
|
next if element['style']
|
15
15
|
|
16
|
-
# Bail out if src doesn't look like a valid http url.
|
16
|
+
# Bail out if src doesn't look like a valid http url. trying to avoid weird
|
17
17
|
# js injection via javascript: urls.
|
18
18
|
next if element['src'].to_s.strip =~ /\Ajavascript/i
|
19
19
|
|
@@ -33,7 +33,7 @@ module HTML
|
|
33
33
|
:elements => %w(
|
34
34
|
h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
|
35
35
|
div ins del sup sub p ol ul table blockquote dl dt dd
|
36
|
-
kbd q samp var hr ruby rt rp
|
36
|
+
kbd q samp var hr ruby rt rp li tr td th
|
37
37
|
),
|
38
38
|
:attributes => {
|
39
39
|
'a' => ['href'],
|
@@ -62,22 +62,20 @@ module HTML
|
|
62
62
|
'img' => {'src' => ['http', 'https', :relative]}
|
63
63
|
},
|
64
64
|
:transformers => [
|
65
|
-
#
|
66
|
-
# top-level <li> elements are removed because they can break out of
|
65
|
+
# Top-level <li> elements are removed because they can break out of
|
67
66
|
# containing markup.
|
68
67
|
lambda { |env|
|
69
68
|
name, node = env[:node_name], env[:node]
|
70
|
-
if name == LIST_ITEM && node.ancestors.any?{ |n| LISTS.include?(n.name) }
|
71
|
-
|
69
|
+
if name == LIST_ITEM && !node.ancestors.any?{ |n| LISTS.include?(n.name) }
|
70
|
+
node.replace(node.children)
|
72
71
|
end
|
73
72
|
},
|
74
73
|
|
75
|
-
# Whitelist only table child elements that are descended from a <table>.
|
76
74
|
# Table child elements that are not contained by a <table> are removed.
|
77
75
|
lambda { |env|
|
78
76
|
name, node = env[:node_name], env[:node]
|
79
|
-
if TABLE_ITEMS.include?(name) && node.ancestors.any? { |n| n.name == TABLE }
|
80
|
-
|
77
|
+
if TABLE_ITEMS.include?(name) && !node.ancestors.any? { |n| n.name == TABLE }
|
78
|
+
node.replace(node.children)
|
81
79
|
end
|
82
80
|
}
|
83
81
|
]
|
@@ -36,4 +36,12 @@ class HTML::Pipeline::CamoFilterTest < Test::Unit::TestCase
|
|
36
36
|
CamoFilter.call(orig, @options).to_s
|
37
37
|
end
|
38
38
|
end
|
39
|
+
|
40
|
+
def test_required_context_validation
|
41
|
+
exception = assert_raise(ArgumentError) {
|
42
|
+
CamoFilter.call("", {})
|
43
|
+
}
|
44
|
+
assert_match /:asset_proxy[^_]/, exception.message
|
45
|
+
assert_match /:asset_proxy_secret_key/, exception.message
|
46
|
+
end
|
39
47
|
end
|
@@ -1,16 +1,18 @@
|
|
1
1
|
require 'test_helper'
|
2
2
|
|
3
3
|
class HTML::Pipeline::EmojiFilterTest < Test::Unit::TestCase
|
4
|
+
EmojiFilter = HTML::Pipeline::EmojiFilter
|
5
|
+
|
4
6
|
def test_emojify
|
5
|
-
filter =
|
7
|
+
filter = EmojiFilter.new("<p>:shipit:</p>", {:asset_root => 'https://foo.com'})
|
6
8
|
doc = filter.call
|
7
9
|
assert_match "https://foo.com/emoji/shipit.png", doc.search('img').attr('src').value
|
8
10
|
end
|
9
|
-
|
10
|
-
def
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
11
|
+
|
12
|
+
def test_required_context_validation
|
13
|
+
exception = assert_raise(ArgumentError) {
|
14
|
+
EmojiFilter.call("", {})
|
15
|
+
}
|
16
|
+
assert_match /:asset_root/, exception.message
|
15
17
|
end
|
16
18
|
end
|
@@ -32,7 +32,7 @@ class HTML::Pipeline::SanitizationFilterTest < Test::Unit::TestCase
|
|
32
32
|
def test_sanitizes_li_elements_not_contained_in_ul_or_ol
|
33
33
|
stuff = "a\n<li>b</li>\nc"
|
34
34
|
html = SanitizationFilter.call(stuff).to_s
|
35
|
-
assert_equal "a\
|
35
|
+
assert_equal "a\nb\nc", html
|
36
36
|
end
|
37
37
|
|
38
38
|
def test_does_not_sanitize_li_elements_contained_in_ul_or_ol
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html-pipeline
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.5
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2012-
|
13
|
+
date: 2012-12-10 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: gemoji
|
@@ -19,7 +19,7 @@ dependencies:
|
|
19
19
|
requirements:
|
20
20
|
- - ~>
|
21
21
|
- !ruby/object:Gem::Version
|
22
|
-
version: 1.
|
22
|
+
version: '1.0'
|
23
23
|
type: :runtime
|
24
24
|
prerelease: false
|
25
25
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -27,7 +27,7 @@ dependencies:
|
|
27
27
|
requirements:
|
28
28
|
- - ~>
|
29
29
|
- !ruby/object:Gem::Version
|
30
|
-
version: 1.
|
30
|
+
version: '1.0'
|
31
31
|
- !ruby/object:Gem::Dependency
|
32
32
|
name: nokogiri
|
33
33
|
requirement: !ruby/object:Gem::Requirement
|
@@ -150,6 +150,7 @@ extra_rdoc_files: []
|
|
150
150
|
files:
|
151
151
|
- .gitignore
|
152
152
|
- .travis.yml
|
153
|
+
- CHANGELOG.md
|
153
154
|
- Gemfile
|
154
155
|
- LICENSE
|
155
156
|
- README.md
|
@@ -184,7 +185,8 @@ files:
|
|
184
185
|
- test/html/pipeline/toc_filter_test.rb
|
185
186
|
- test/test_helper.rb
|
186
187
|
homepage: https://github.com/jch/html-pipeline
|
187
|
-
licenses:
|
188
|
+
licenses:
|
189
|
+
- MIT
|
188
190
|
post_install_message:
|
189
191
|
rdoc_options: []
|
190
192
|
require_paths:
|