html-pipeline 0.0.4 → 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,12 @@
1
+ # CHANGELOG
2
+
3
+ ## 0.0.5 (unreleased)
4
+
5
+ * fix li xss vulnerability in sanitization filter: vmg #31
6
+ * gemspec cleanup: nbibler #23, jbarnette #24
7
+ * doc updates: jch #16, pborreli #17, wickedshimmy #18, benubois #19, blackerby #21
8
+ * loosen gemoji dependency: josh #15
9
+
10
+ ## 0.0.4
11
+
12
+ * initial public release
data/README.md CHANGED
@@ -53,7 +53,9 @@ pipeline = HTML::Pipeline.new [
53
53
  result = pipeline.call <<CODE
54
54
  This is *great*:
55
55
 
56
- some_code(:first)
56
+ ``` ruby
57
+ some_code(:first)
58
+ ```
57
59
 
58
60
  CODE
59
61
  result[:output].to_s
@@ -83,19 +85,108 @@ filter.call
83
85
  ## Filters
84
86
 
85
87
  * `MentionFilter` - replace `@user` mentions with links
86
- * `AutoLinkFilter` - auto_linking urls in HTML
87
- * `CamoFilter` - replace http image urls with [camo-fied](https://github.com/github/camo) https versions
88
+ * `AutolinkFilter` - auto_linking urls in HTML
89
+ * `CamoFilter` - replace http image urls with [camo-fied](https://github.com/atmos/camo) https versions
88
90
  * `EmailReplyFilter` - util filter for working with emails
89
91
  * `EmojiFilter` - everyone loves [emoji](http://www.emoji-cheat-sheet.com/)!
92
+ * `HttpsFilter` - HTML Filter for replacing http github urls with https versions.
90
93
  * `ImageMaxWidthFilter` - link to full size image for large images
91
94
  * `MarkdownFilter` - convert markdown to html
92
95
  * `PlainTextInputFilter` - html escape text and wrap the result in a div
93
- * `SanitizationFilter` - whitelist santize user markup
96
+ * `SanitizationFilter` - whitelist sanitize user markup
94
97
  * `SyntaxHighlightFilter` - code syntax highlighter with [linguist](https://github.com/github/linguist)
95
98
  * `TextileFilter` - convert textile to html
96
99
  * `TableOfContentsFilter` - anchor headings with name attributes
97
100
 
98
- ## Development Setup
101
+ ## Examples
102
+
103
+ We define different pipelines for different parts of our app. Here are a few
104
+ paraphrased snippets to get you started:
105
+
106
+ ```ruby
107
+ # The context hash is how you pass options between different filters.
108
+ # See individual filter source for explanation of options.
109
+ context = {
110
+ :asset_root => "http://your-domain.com/where/your/images/live/icons",
111
+ :base_url => "http://your-domain.com"
112
+ }
113
+
114
+ # Pipeline providing sanitization and image hijacking but no mention
115
+ # related features.
116
+ SimplePipeline = Pipeline.new [
117
+ SanitizationFilter,
118
+ TableOfContentsFilter, # add 'name' anchors to all headers
119
+ CamoFilter,
120
+ ImageMaxWidthFilter,
121
+ SyntaxHighlightFilter,
122
+ EmojiFilter,
123
+ AutolinkFilter
124
+ ], context, {}
125
+
126
+ # Pipeline used for user provided content on the web
127
+ MarkdownPipeline = Pipeline.new [
128
+ MarkdownFilter,
129
+ SanitizationFilter,
130
+ CamoFilter,
131
+ ImageMaxWidthFilter,
132
+ HttpsFilter,
133
+ MentionFilter,
134
+ EmojiFilter,
135
+ SyntaxHighlightFilter
136
+ ], context.merge(:gfm => true), {} # enable github formatted markdown
137
+
138
+
139
+ # Define a pipeline based on another pipeline's filters
140
+ NonGFMMarkdownPipeline = Pipeline.new(MarkdownPipeline.filters,
141
+ context.merge(:gfm => false), {})
142
+
143
+ # Pipelines aren't limited to the web. You can use them for email
144
+ # processing also.
145
+ HtmlEmailPipeline = Pipeline.new [
146
+ ImageMaxWidthFilter
147
+ ], {}, {}
148
+
149
+ # Just emoji.
150
+ EmojiPipeline = Pipeline.new [
151
+ HTMLInputFilter,
152
+ EmojiFilter
153
+ ], context, {}
154
+ ```
155
+
156
+ ## Extending
157
+ To write a custom filter, you need a class with a `call` method that inherits
158
+ from `HTML::Pipeline::Filter`.
159
+
160
+ For example this filter adds a base url to images that are root relative:
161
+
162
+ ```ruby
163
+ require 'uri'
164
+
165
+ class RootRelativeFilter < HTML::Pipeline::Filter
166
+
167
+ def call
168
+ doc.search("img").each do |img|
169
+ next if img['src'].nil?
170
+ src = img['src'].strip
171
+ if src.start_with? '/'
172
+ img["src"] = URI.join(context[:base_url], src).to_s
173
+ end
174
+ end
175
+ doc
176
+ end
177
+
178
+ end
179
+ ```
180
+
181
+ Now this filter can be used in a pipeline:
182
+
183
+ ```ruby
184
+ Pipeline.new [ RootRelativeFilter ], { :base_url => 'http://somehost.com' }
185
+ ```
186
+
187
+ ## Development
188
+
189
+ To see what has changed in recent versions, see the [CHANGELOG](https://github.com/jch/html-pipeline/blob/master/CHANGELOG.md).
99
190
 
100
191
  ```sh
101
192
  bundle
@@ -104,11 +195,11 @@ rake test
104
195
 
105
196
  ## Contributing
106
197
 
107
- 1. Fork it
198
+ 1. [Fork it](https://help.github.com/articles/fork-a-repo)
108
199
  2. Create your feature branch (`git checkout -b my-new-feature`)
109
200
  3. Commit your changes (`git commit -am 'Added some feature'`)
110
201
  4. Push to the branch (`git push origin my-new-feature`)
111
- 5. Create new Pull Request
202
+ 5. Create new [Pull Request](https://help.github.com/articles/using-pull-requests)
112
203
 
113
204
 
114
205
  ## TODO
@@ -126,3 +217,5 @@ rake test
126
217
  * [Simon Rozet](mailto:simon@rozet.name)
127
218
  * [Vicent Martí](mailto:tanoku@gmail.com)
128
219
  * [Risk :danger: Olson](mailto:technoweenie@gmail.com)
220
+
221
+ Project is a member of the [OSS Manifesto](http://ossmanifesto.org/).
@@ -1,9 +1,10 @@
1
1
  # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/html/pipeline/version', __FILE__)
2
+ require File.expand_path("../lib/html/pipeline/version", __FILE__)
3
3
 
4
4
  Gem::Specification.new do |gem|
5
5
  gem.name = "html-pipeline"
6
6
  gem.version = HTML::Pipeline::VERSION
7
+ gem.license = "MIT"
7
8
  gem.authors = ["Ryan Tomayko", "Jerry Cheung"]
8
9
  gem.email = ["ryan@github.com", "jerry@github.com"]
9
10
  gem.description = %q{GitHub HTML processing filters and utilities}
@@ -14,12 +15,12 @@ Gem::Specification.new do |gem|
14
15
  gem.test_files = gem.files.grep(%r{^test})
15
16
  gem.require_paths = ["lib"]
16
17
 
17
- gem.add_dependency 'gemoji', '~> 1.1.1'
18
- gem.add_dependency 'nokogiri', '~> 1.4'
19
- gem.add_dependency 'github-markdown', '~> 0.5'
20
- gem.add_dependency 'sanitize', '~> 2.0'
21
- gem.add_dependency 'github-linguist', '~> 2.1'
22
- gem.add_dependency 'rinku', '~> 1.7'
23
- gem.add_dependency 'escape_utils', '~> 0.2'
24
- gem.add_dependency 'activesupport', '>= 2'
18
+ gem.add_dependency "gemoji", "~> 1.0"
19
+ gem.add_dependency "nokogiri", "~> 1.4"
20
+ gem.add_dependency "github-markdown", "~> 0.5"
21
+ gem.add_dependency "sanitize", "~> 2.0"
22
+ gem.add_dependency "github-linguist", "~> 2.1"
23
+ gem.add_dependency "rinku", "~> 1.7"
24
+ gem.add_dependency "escape_utils", "~> 0.2"
25
+ gem.add_dependency "activesupport", ">= 2"
25
26
  end
@@ -9,8 +9,8 @@ module HTML
9
9
  #
10
10
  # See HTML::Pipeline::Filter for information on building filters.
11
11
  #
12
- # Contruct a Pipeline for running multiple HTML filters. A pipeline is created once
13
- # with one to many filters, and is then can be `call`ed many times over the course
12
+ # Construct a Pipeline for running multiple HTML filters. A pipeline is created once
13
+ # with one to many filters, and it then can be `call`ed many times over the course
14
14
  # of its lifetime with input.
15
15
  #
16
16
  # filters - Array of Filter objects. Each must respond to call(doc,
@@ -22,7 +22,7 @@ module HTML
22
22
  # nil. Default: empty Hash.
23
23
  # result_class - The default Class of the result object for individual
24
24
  # calls. Default: Hash. Protip: Pass in a Struct to get
25
- # some semblence of type safety.
25
+ # some semblance of type safety.
26
26
  class Pipeline
27
27
  autoload :VERSION, 'html/pipeline/version'
28
28
  autoload :Pipeline, 'html/pipeline/pipeline'
@@ -11,8 +11,8 @@ module HTML
11
11
  # in browser clients.
12
12
  #
13
13
  # Context options:
14
- # :asset_proxy - Base URL for constructed asset proxy URLs.
15
- # :asset_proxy_secret_key - The shared secret used to encode URLs.
14
+ # :asset_proxy (required) - Base URL for constructed asset proxy URLs.
15
+ # :asset_proxy_secret_key (required) - The shared secret used to encode URLs.
16
16
  #
17
17
  # This filter does not write additional information to the context.
18
18
  class CamoFilter < Filter
@@ -33,6 +33,12 @@ module HTML
33
33
  end
34
34
  doc
35
35
  end
36
+
37
+ # Implementation of validate hook.
38
+ # Errors should raise exceptions or use an existing validator.
39
+ def validate
40
+ needs :asset_proxy, :asset_proxy_secret_key
41
+ end
36
42
 
37
43
  # The camouflaged URL for a given image URL.
38
44
  def asset_proxy_url(url)
@@ -47,11 +53,11 @@ module HTML
47
53
 
48
54
  # Private: the hostname to use for generated asset proxied URLs.
49
55
  def asset_proxy_host
50
- context[:asset_proxy] or raise "Missing context :asset_proxy"
56
+ context[:asset_proxy]
51
57
  end
52
58
 
53
59
  def asset_proxy_secret_key
54
- context[:asset_proxy_secret_key] or raise "Missing context :asset_proxy_secret_key"
60
+ context[:asset_proxy_secret_key]
55
61
  end
56
62
 
57
63
  # Private: helper to hexencode a string. Each byte ends up encoded into
@@ -5,7 +5,7 @@ module HTML
5
5
  # HTML filter that replaces :emoji: with images.
6
6
  #
7
7
  # Context:
8
- # :asset_root - base url to link to emoji sprite
8
+ # :asset_root (required) - base url to link to emoji sprite
9
9
  class EmojiFilter < Filter
10
10
  # Build a regexp that matches all valid :emoji: names.
11
11
  EmojiPattern = /:(#{Emoji.names.map { |name| Regexp.escape(name) }.join('|')}):/
@@ -21,6 +21,12 @@ module HTML
21
21
  end
22
22
  doc
23
23
  end
24
+
25
+ # Implementation of validate hook.
26
+ # Errors should raise exceptions or use an existing validator.
27
+ def validate
28
+ needs :asset_root
29
+ end
24
30
 
25
31
  # Replace :emoji: with corresponding images.
26
32
  #
@@ -41,7 +47,7 @@ module HTML
41
47
  # Raises ArgumentError if context option has not been provided.
42
48
  # Returns the context's asset_root.
43
49
  def asset_root
44
- context[:asset_root] or raise ArgumentError, "Missing context :asset_root"
50
+ context[:asset_root]
45
51
  end
46
52
  end
47
53
  end
@@ -39,8 +39,9 @@ module HTML
39
39
  end
40
40
  @context = context || {}
41
41
  @result = result || {}
42
+ validate
42
43
  end
43
-
44
+
44
45
  # Public: Returns a simple Hash used to pass extra information into filters
45
46
  # and also to allow filters to make extracted information available to the
46
47
  # caller.
@@ -73,6 +74,10 @@ module HTML
73
74
  def call
74
75
  raise NotImplementedError
75
76
  end
77
+
78
+ # Make sure the context has everything we need. Noop: Subclasses can override.
79
+ def validate
80
+ end
76
81
 
77
82
  # The Repository object provided in the context hash, or nil when no
78
83
  # :repository was specified.
@@ -153,6 +158,21 @@ module HTML
153
158
  output.to_s
154
159
  end
155
160
  end
161
+
162
+ # Validator for required context. This will check that anything passed in
163
+ # contexts exists in @contexts
164
+ #
165
+ # If any errors are found an ArgumentError will be raised with a
166
+ # message listing all the missing contexts and the filters that
167
+ # require them.
168
+ def needs(*keys)
169
+ missing = keys.reject { |key| context.include? key }
170
+
171
+ if missing.any?
172
+ raise ArgumentError,
173
+ "Missing context keys for #{self.class.name}: #{missing.map(&:inspect).join ', '}"
174
+ end
175
+ end
156
176
  end
157
177
  end
158
178
  end
@@ -9,11 +9,11 @@ module HTML
9
9
  class ImageMaxWidthFilter < Filter
10
10
  def call
11
11
  doc.search('img').each do |element|
12
- # Skip if theres already a style attribute. Not sure how this
12
+ # Skip if there's already a style attribute. Not sure how this
13
13
  # would happen but we can reconsider it in the future.
14
14
  next if element['style']
15
15
 
16
- # Bail out if src doesn't look like a valid http url. tryna avoid weird
16
+ # Bail out if src doesn't look like a valid http url. trying to avoid weird
17
17
  # js injection via javascript: urls.
18
18
  next if element['src'].to_s.strip =~ /\Ajavascript/i
19
19
 
@@ -33,7 +33,7 @@ module HTML
33
33
  :elements => %w(
34
34
  h1 h2 h3 h4 h5 h6 h7 h8 br b i strong em a pre code img tt
35
35
  div ins del sup sub p ol ul table blockquote dl dt dd
36
- kbd q samp var hr ruby rt rp
36
+ kbd q samp var hr ruby rt rp li tr td th
37
37
  ),
38
38
  :attributes => {
39
39
  'a' => ['href'],
@@ -62,22 +62,20 @@ module HTML
62
62
  'img' => {'src' => ['http', 'https', :relative]}
63
63
  },
64
64
  :transformers => [
65
- # whitelist only <li> elements that are descended from a <ul> or <ol>.
66
- # top-level <li> elements are removed because they can break out of
65
+ # Top-level <li> elements are removed because they can break out of
67
66
  # containing markup.
68
67
  lambda { |env|
69
68
  name, node = env[:node_name], env[:node]
70
- if name == LIST_ITEM && node.ancestors.any?{ |n| LISTS.include?(n.name) }
71
- {:node_whitelist => [node]}
69
+ if name == LIST_ITEM && !node.ancestors.any?{ |n| LISTS.include?(n.name) }
70
+ node.replace(node.children)
72
71
  end
73
72
  },
74
73
 
75
- # Whitelist only table child elements that are descended from a <table>.
76
74
  # Table child elements that are not contained by a <table> are removed.
77
75
  lambda { |env|
78
76
  name, node = env[:node_name], env[:node]
79
- if TABLE_ITEMS.include?(name) && node.ancestors.any? { |n| n.name == TABLE }
80
- { :node_whitelist => [node] }
77
+ if TABLE_ITEMS.include?(name) && !node.ancestors.any? { |n| n.name == TABLE }
78
+ node.replace(node.children)
81
79
  end
82
80
  }
83
81
  ]
@@ -1,5 +1,5 @@
1
1
  module HTML
2
2
  class Pipeline
3
- VERSION = "0.0.4"
3
+ VERSION = "0.0.5"
4
4
  end
5
5
  end
@@ -36,4 +36,12 @@ class HTML::Pipeline::CamoFilterTest < Test::Unit::TestCase
36
36
  CamoFilter.call(orig, @options).to_s
37
37
  end
38
38
  end
39
+
40
+ def test_required_context_validation
41
+ exception = assert_raise(ArgumentError) {
42
+ CamoFilter.call("", {})
43
+ }
44
+ assert_match /:asset_proxy[^_]/, exception.message
45
+ assert_match /:asset_proxy_secret_key/, exception.message
46
+ end
39
47
  end
@@ -1,16 +1,18 @@
1
1
  require 'test_helper'
2
2
 
3
3
  class HTML::Pipeline::EmojiFilterTest < Test::Unit::TestCase
4
+ EmojiFilter = HTML::Pipeline::EmojiFilter
5
+
4
6
  def test_emojify
5
- filter = HTML::Pipeline::EmojiFilter.new("<p>:shipit:</p>", {:asset_root => 'https://foo.com'})
7
+ filter = EmojiFilter.new("<p>:shipit:</p>", {:asset_root => 'https://foo.com'})
6
8
  doc = filter.call
7
9
  assert_match "https://foo.com/emoji/shipit.png", doc.search('img').attr('src').value
8
10
  end
9
-
10
- def test_missing_context
11
- filter = HTML::Pipeline::EmojiFilter.new("<p>:shipit:</p>", {})
12
- assert_raises ArgumentError do
13
- filter.call
14
- end
11
+
12
+ def test_required_context_validation
13
+ exception = assert_raise(ArgumentError) {
14
+ EmojiFilter.call("", {})
15
+ }
16
+ assert_match /:asset_root/, exception.message
15
17
  end
16
18
  end
@@ -32,7 +32,7 @@ class HTML::Pipeline::SanitizationFilterTest < Test::Unit::TestCase
32
32
  def test_sanitizes_li_elements_not_contained_in_ul_or_ol
33
33
  stuff = "a\n<li>b</li>\nc"
34
34
  html = SanitizationFilter.call(stuff).to_s
35
- assert_equal "a\n b \nc", html
35
+ assert_equal "a\nb\nc", html
36
36
  end
37
37
 
38
38
  def test_does_not_sanitize_li_elements_contained_in_ul_or_ol
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html-pipeline
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2012-11-07 00:00:00.000000000 Z
13
+ date: 2012-12-10 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: gemoji
@@ -19,7 +19,7 @@ dependencies:
19
19
  requirements:
20
20
  - - ~>
21
21
  - !ruby/object:Gem::Version
22
- version: 1.1.1
22
+ version: '1.0'
23
23
  type: :runtime
24
24
  prerelease: false
25
25
  version_requirements: !ruby/object:Gem::Requirement
@@ -27,7 +27,7 @@ dependencies:
27
27
  requirements:
28
28
  - - ~>
29
29
  - !ruby/object:Gem::Version
30
- version: 1.1.1
30
+ version: '1.0'
31
31
  - !ruby/object:Gem::Dependency
32
32
  name: nokogiri
33
33
  requirement: !ruby/object:Gem::Requirement
@@ -150,6 +150,7 @@ extra_rdoc_files: []
150
150
  files:
151
151
  - .gitignore
152
152
  - .travis.yml
153
+ - CHANGELOG.md
153
154
  - Gemfile
154
155
  - LICENSE
155
156
  - README.md
@@ -184,7 +185,8 @@ files:
184
185
  - test/html/pipeline/toc_filter_test.rb
185
186
  - test/test_helper.rb
186
187
  homepage: https://github.com/jch/html-pipeline
187
- licenses: []
188
+ licenses:
189
+ - MIT
188
190
  post_install_message:
189
191
  rdoc_options: []
190
192
  require_paths: