trenni-sanitize 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 6fcc82f737fa654994e5e196347156bb38bfa74c
4
+ data.tar.gz: 7b8bafec2a1731ecc20c9f850e9620e9941a6f9d
5
+ SHA512:
6
+ metadata.gz: 106aab6ca77fed6efc2f7f77ab5f83d5606f20fafd0b80dd05d1aa2421b5626129d02a2513b8636eb8745d3b49bf2b6c4657a60156be90193bc20faebeaa5997
7
+ data.tar.gz: 48735a28e214041eba2e7f1087b2c553efe7d3816ece68ae9b00dfbc970abbd06fcd34e5ee4478d8a9133f0c851b40f7f66aaa3734935daaf91237e3aaf2029e
data/.gitignore ADDED
@@ -0,0 +1,19 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+
19
+ lib/trenni/trenni.bundle
data/.rspec ADDED
@@ -0,0 +1,5 @@
1
+ --color
2
+ --format documentation
3
+ --backtrace
4
+ --warnings
5
+ --require spec_helper
data/.simplecov ADDED
@@ -0,0 +1,9 @@
1
+
2
+ SimpleCov.start do
3
+ add_filter "/spec/"
4
+ end
5
+
6
+ if ENV['TRAVIS']
7
+ require 'coveralls'
8
+ Coveralls.wear!
9
+ end
data/.travis.yml ADDED
@@ -0,0 +1,17 @@
1
+ language: ruby
2
+ sudo: false
3
+ rvm:
4
+ - 2.1
5
+ - 2.2
6
+ - 2.3
7
+ - 2.4
8
+ - ruby-head
9
+ - jruby-head
10
+ - rbx-2
11
+ env:
12
+ - COVERAGE=true
13
+ matrix:
14
+ allow_failures:
15
+ - rvm: "rbx-2"
16
+ - rvm: "ruby-head"
17
+ - rvm: "jruby-head"
data/Gemfile ADDED
@@ -0,0 +1,16 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in trenni.gemspec
4
+ gemspec
5
+
6
+ group :development do
7
+ gem 'pry'
8
+ end
9
+
10
+ group :test do
11
+ gem 'ruby-prof', platforms: [:mri]
12
+ gem "benchmark-ips"
13
+
14
+ # For comparisons:
15
+ gem "sanitize"
16
+ end
data/README.md ADDED
@@ -0,0 +1,139 @@
1
+ # trenni-sanitize::Sanitize
2
+
3
+ Sanitize markup by adding, changing or removing tags.
4
+
5
+ [![Build Status](https://secure.travis-ci.org/ioquatix/trenni-sanitize.svg)](http://travis-ci.org/ioquatix/trenni-sanitize)
6
+ [![Code Climate](https://codeclimate.com/github/ioquatix/trenni-sanitize.svg)](https://codeclimate.com/github/ioquatix/trenni-sanitize)
7
+ [![Coverage Status](https://coveralls.io/repos/ioquatix/trenni-sanitize/badge.svg)](https://coveralls.io/r/ioquatix/trenni-sanitize)
8
+
9
+ ## Motivation
10
+
11
+ I use the [sanitize] gem and generally it's great. However, it's performance can be an issue and additionally, it doesn't preserve tag namespaces when parsing fragments due to how Nokogiri works internally.
12
+
13
+ [sanitize]: https://github.com/rgrove/sanitize/
14
+
15
+ ## Is it fast?
16
+
17
+ In my informal testing, this gem is about ~50x faster than the [sanitize] gem when generating plain text.
18
+
19
+ Warming up --------------------------------------
20
+ Sanitize 96.000 i/100ms
21
+ Trenni::Sanitize 4.447k i/100ms
22
+ Calculating -------------------------------------
23
+ Sanitize 958.020 (± 4.5%) i/s - 4.800k in 5.020564s
24
+ Trenni::Sanitize 44.718k (± 4.2%) i/s - 226.797k in 5.080756s
25
+
26
+ Comparison:
27
+ Trenni::Sanitize: 44718.1 i/s
28
+ Sanitize: 958.0 i/s - 46.68x slower
29
+
30
+ ## Installation
31
+
32
+ Add this line to your application's Gemfile:
33
+
34
+ gem 'trenni-sanitize'
35
+
36
+ And then execute:
37
+
38
+ $ bundle
39
+
40
+ Or install it yourself as:
41
+
42
+ $ gem install trenni-sanitize
43
+
44
+ ## Usage
45
+
46
+ `Trenni::Sanitize::Delegate` is a stream-based processor. That means it parses the incoming markup and makes decisions about what to keep and what to discard during parsing.
47
+
48
+ ### Extracting Text
49
+
50
+ You can extract text using something similar to the following parser delegate:
51
+
52
+ ```ruby
53
+ class Text < Trenni::Sanitize::Filter
54
+ def filter(tag)
55
+ # Filter out all tags
56
+ return false
57
+ end
58
+
59
+ def doctype(string)
60
+ end
61
+
62
+ def instruction(string)
63
+ end
64
+
65
+ def text(string)
66
+ # Output all text
67
+ @output << string
68
+ end
69
+ end
70
+
71
+ text = Text.parse("<p>Hello World</p>").output
72
+ # => "Hello World"
73
+ ```
74
+
75
+ ### Extracting Safe Markup
76
+
77
+ Here is a simple filter that only allows a limited set of tags:
78
+
79
+ ```ruby
80
+ class Fragment < Trenni::Sanitize::Filter
81
+ STANDARD_ATTRIBUTES = ['class'].freeze
82
+
83
+ ALLOWED_TAGS = {
84
+ 'em' => [],
85
+ 'strong' => [],
86
+ 'p' => [],
87
+ 'img' => [] + ['src', 'alt', 'width', 'height'],
88
+ 'a' => ['href', 'target']
89
+ }.freeze
90
+
91
+ def filter(tag)
92
+ if attributes = ALLOWED_TAGS[tag.name]
93
+ tag.attributes.slice!(attributes)
94
+
95
+ return tag
96
+ end
97
+ end
98
+
99
+ def doctype(string)
100
+ end
101
+
102
+ def instruction(string)
103
+ end
104
+ end
105
+ ```
106
+
107
+ As you can see, while [sanitize] is driven by configuration, `Trenni::Sanitize::Filter` is driven by code.
108
+
109
+ ## Contributing
110
+
111
+ 1. Fork it
112
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
113
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
114
+ 4. Push to the branch (`git push origin my-new-feature`)
115
+ 5. Create new Pull Request
116
+
117
+ ## License
118
+
119
+ Released under the MIT license.
120
+
121
+ Copyright, 2018, by [Samuel G. D. Williams](http://www.codeotaku.com/samuel-williams).
122
+
123
+ Permission is hereby granted, free of charge, to any person obtaining a copy
124
+ of this software and associated documentation files (the "Software"), to deal
125
+ in the Software without restriction, including without limitation the rights
126
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
127
+ copies of the Software, and to permit persons to whom the Software is
128
+ furnished to do so, subject to the following conditions:
129
+
130
+ The above copyright notice and this permission notice shall be included in
131
+ all copies or substantial portions of the Software.
132
+
133
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
134
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
135
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
136
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
137
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
138
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
139
+ THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,19 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ # Load all rake tasks:
5
+ import(*Dir.glob('tasks/**/*.rake'))
6
+
7
+ RSpec::Core::RakeTask.new(:test)
8
+
9
+ task :environment do
10
+ $LOAD_PATH.unshift File.expand_path('lib', __dir__)
11
+ end
12
+
13
+ task :console => :environment do
14
+ require 'pry'
15
+
16
+ Pry.start
17
+ end
18
+
19
+ task :default => :test
@@ -0,0 +1,25 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require_relative 'sanitize/extensions'
22
+
23
+ require_relative 'sanitize/text'
24
+ require_relative 'sanitize/fragment'
25
+
@@ -0,0 +1,14 @@
1
+
2
+ class Hash
3
+ unless defined?(slice)
4
+ def slice(*keys)
5
+ self.select{|key, value| keys.include? key}
6
+ end
7
+ end
8
+
9
+ unless defined?(slice!)
10
+ def slice!(*keys)
11
+ self.select!{|key, value| keys.include? key}
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,159 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require 'trenni/parsers'
22
+ require 'trenni/builder'
23
+ require 'trenni/entities'
24
+
25
+ module Trenni
26
+ module Sanitize
27
+ # Provides a high level interface for parsing markup.
28
+ class Filter
29
+ TAG = 1
30
+
31
+ DOCTYPE = 2
32
+ COMMENT = 4
33
+ INSTRUCTION = 8
34
+ CDATA = 16
35
+ TEXT = 32
36
+
37
+ CONTENT = DOCTYPE | COMMENT | INSTRUCTION | CDATA | TEXT
38
+ ALL = TAG | CONTENT
39
+
40
+ def self.parse(input, output = nil, entities = Trenni::Entities::HTML5)
41
+ # This allows us to handle passing in a string:
42
+ input = Trenni::Buffer(input)
43
+
44
+ output ||= MarkupString.new.force_encoding(input.encoding)
45
+
46
+ delegate = self.new(output, entities)
47
+
48
+ delegate.parse!(input)
49
+
50
+ return delegate
51
+ end
52
+
53
+ Node = Struct.new(:name, :tag, :skip) do
54
+ def skip!(mode = ALL)
55
+ self.skip |= mode
56
+ end
57
+
58
+ def skip?(mode = ALL)
59
+ (self.skip & mode) == mode
60
+ end
61
+
62
+ def [] key
63
+ self.tag.attributes[key]
64
+ end
65
+ end
66
+
67
+ def initialize(output, entities)
68
+ @output = output
69
+
70
+ @entities = entities
71
+
72
+ @current = nil
73
+ @stack = []
74
+
75
+ @current = @top = Node.new(nil, nil, 0)
76
+
77
+ @skip = nil
78
+ end
79
+
80
+ attr :output
81
+
82
+ # The current node being parsed.
83
+ attr :current
84
+
85
+ def top
86
+ @stack.last || @top
87
+ end
88
+
89
+ def parse!(input)
90
+ Trenni::Parsers.parse_markup(input, self, @entities)
91
+
92
+ while @stack.size > 1
93
+ close_tag(@stack.last.name)
94
+ end
95
+
96
+ return self
97
+ end
98
+
99
+ def open_tag_begin(name, offset)
100
+ tag = Tag.new(name, false, {})
101
+
102
+ @current = Node.new(name, tag, current.skip)
103
+ end
104
+
105
+ def attribute(key, value)
106
+ @current.tag.attributes[key] = value
107
+ end
108
+
109
+ def open_tag_end(self_closing)
110
+ if self_closing
111
+ @current.tag.closed = true
112
+ else
113
+ @stack << @current
114
+ end
115
+
116
+ filter(@current)
117
+
118
+ @current.tag.write_opening_tag(@output) unless @current.skip? TAG
119
+
120
+ # If the tag was self-closing, it's no longer current at this point, we are back in the context of the parent tag.
121
+ @current = self.top if self_closing
122
+ end
123
+
124
+ def close_tag(name, offset = nil)
125
+ while node = @stack.pop
126
+ node.tag.write_closing_tag(@output) unless node.skip? TAG
127
+
128
+ break if node.name == name
129
+ end
130
+
131
+ @current = self.top
132
+ end
133
+
134
+ def filter(tag)
135
+ return tag
136
+ end
137
+
138
+ def doctype(string)
139
+ @output << string unless current.skip? DOCTYPE
140
+ end
141
+
142
+ def comment(string)
143
+ @output << string unless current.skip? COMMENT
144
+ end
145
+
146
+ def instruction(string)
147
+ @output << string unless current.skip? INSTRUCTION
148
+ end
149
+
150
+ def cdata(string)
151
+ @output << string unless current.skip? CDATA
152
+ end
153
+
154
+ def text(string)
155
+ Markup.append(@output, string) unless current.skip? TEXT
156
+ end
157
+ end
158
+ end
159
+ end
@@ -0,0 +1,65 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require_relative 'filter'
22
+
23
+ module Trenni
24
+ module Sanitize
25
+ class Fragment < Filter
26
+ STANDARD_ATTRIBUTES = ['class', 'style'].freeze
27
+
28
+ ALLOWED_TAGS = {
29
+ 'div' => STANDARD_ATTRIBUTES,
30
+ 'span' => STANDARD_ATTRIBUTES,
31
+ 'br' => STANDARD_ATTRIBUTES,
32
+ 'b' => STANDARD_ATTRIBUTES,
33
+ 'i' => STANDARD_ATTRIBUTES,
34
+ 'em' => STANDARD_ATTRIBUTES,
35
+ 'strong' => STANDARD_ATTRIBUTES,
36
+ 'ul' => STANDARD_ATTRIBUTES,
37
+ 'strike' => STANDARD_ATTRIBUTES,
38
+ 'h1' => STANDARD_ATTRIBUTES,
39
+ 'h2' => STANDARD_ATTRIBUTES,
40
+ 'h3' => STANDARD_ATTRIBUTES,
41
+ 'h4' => STANDARD_ATTRIBUTES,
42
+ 'h5' => STANDARD_ATTRIBUTES,
43
+ 'h6' => STANDARD_ATTRIBUTES,
44
+ 'p' => STANDARD_ATTRIBUTES,
45
+ 'img' => STANDARD_ATTRIBUTES + ['src', 'alt', 'width', 'height'],
46
+ 'image' => STANDARD_ATTRIBUTES,
47
+ 'a' => STANDARD_ATTRIBUTES + ['href', 'target']
48
+ }.freeze
49
+
50
+ def filter(node)
51
+ if attributes = ALLOWED_TAGS[node.name]
52
+ node.tag.attributes.slice!(attributes)
53
+ else
54
+ node.skip!
55
+ end
56
+ end
57
+
58
+ def doctype(string)
59
+ end
60
+
61
+ def instruction(string)
62
+ end
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,47 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require_relative 'filter'
22
+
23
+ module Trenni
24
+ module Sanitize
25
+ class Text < Filter
26
+ def filter(node)
27
+ if node.name == 'script'
28
+ node.skip!(ALL) # Skip everything including content.
29
+ else
30
+ node.skip!(TAG) # Only skip the tag output, but not the content.
31
+ end
32
+ end
33
+
34
+ def doctype(string)
35
+ end
36
+
37
+ def comment(string)
38
+ end
39
+
40
+ def instruction(string)
41
+ end
42
+
43
+ def cdata(string)
44
+ end
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,25 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ module Trenni
22
+ module Sanitize
23
+ VERSION = "0.1.0"
24
+ end
25
+ end
@@ -0,0 +1,53 @@
1
+
2
+ if ENV['COVERAGE']
3
+ begin
4
+ require 'simplecov'
5
+
6
+ SimpleCov.start do
7
+ add_filter "/spec/"
8
+ end
9
+
10
+ if ENV['TRAVIS']
11
+ require 'coveralls'
12
+ Coveralls.wear!
13
+ end
14
+ rescue LoadError
15
+ warn "Could not load simplecov: #{$!}"
16
+ end
17
+ end
18
+
19
+ require "bundler/setup"
20
+ require "trenni/sanitize"
21
+
22
+ begin
23
+ require 'ruby-prof'
24
+
25
+ RSpec.shared_context "profile" do
26
+ before(:all) do
27
+ RubyProf.start
28
+ end
29
+
30
+ after(:all) do
31
+ result = RubyProf.stop
32
+
33
+ # Print a flat profile to text
34
+ printer = RubyProf::FlatPrinter.new(result)
35
+ printer.print(STDOUT)
36
+ end
37
+ end
38
+ rescue LoadError
39
+ RSpec.shared_context "profile" do
40
+ before(:all) do
41
+ puts "Profiling not supported on this platform."
42
+ end
43
+ end
44
+ end
45
+
46
+ RSpec.configure do |config|
47
+ # Enable flags like --only-failures and --next-failure
48
+ config.example_status_persistence_file_path = ".rspec_status"
49
+
50
+ config.expect_with :rspec do |c|
51
+ c.syntax = :expect
52
+ end
53
+ end
@@ -0,0 +1,36 @@
1
+
2
+ require 'sanitize'
3
+ require 'benchmark/ips'
4
+
5
+ require 'trenni/sanitize/text'
6
+
7
+ RSpec.describe Trenni::Sanitize do
8
+ let(:buffer) {Trenni::Buffer.load_file(File.join(__dir__, "sample.html"))}
9
+
10
+ it "should be faster than alternatives" do
11
+ config = Sanitize::Config.freeze_config(
12
+ :elements => %w[b i em strong ul li strike h1 h2 h3 h4 h5 h6 p img image a],
13
+ :attributes => {
14
+ 'img' => %w[src alt width],
15
+ 'a' => %w[href]
16
+ },
17
+ )
18
+
19
+ text = buffer.read
20
+
21
+ puts Sanitize.fragment(text).inspect
22
+ puts Trenni::Sanitize::Text.parse(buffer).output.inspect
23
+
24
+ Benchmark.ips do |x|
25
+ x.report("Sanitize") do
26
+ Sanitize.fragment text
27
+ end
28
+
29
+ x.report("Trenni::Sanitize") do
30
+ Trenni::Sanitize::Text.parse(buffer)
31
+ end
32
+
33
+ x.compare!
34
+ end
35
+ end
36
+ end
@@ -0,0 +1,60 @@
1
+ # Copyright, 2018, by Samuel G. D. Williams. <http://www.codeotaku.com>
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+ require 'trenni/sanitize/fragment'
22
+
23
+ RSpec.describe Trenni::Sanitize::Fragment do
24
+ it "should filter out script tags" do
25
+ fragment = described_class.parse("<p onclick='malicious()'>Hello World</p><script>doot()</script>")
26
+
27
+ expect(fragment.output).to be == "<p>Hello World</p>"
28
+ end
29
+
30
+ it "should filter out nested script tags" do
31
+ fragment = described_class.parse("<div><p>Hello World</p><script>doot()</script></div>")
32
+
33
+ expect(fragment.output).to be == "<div><p>Hello World</p></div>"
34
+ end
35
+
36
+ it "should filter out tags" do
37
+ fragment = described_class.parse("<p onclick='malicious()'>Hello World</p><script>script</script>")
38
+
39
+ expect(fragment.output).to be == "<p>Hello World</p>"
40
+ end
41
+
42
+ it "should ignore unbalanced closing tags" do
43
+ fragment = described_class.parse("<p>Hello World</a></p>")
44
+
45
+ expect(fragment.output).to be == "<p>Hello World</p>"
46
+ end
47
+
48
+ it "should include trailing text" do
49
+ fragment = described_class.parse("Hello<script/>World")
50
+
51
+ expect(fragment.output).to be == "HelloWorld"
52
+ end
53
+
54
+ it "should escape text" do
55
+ fragment = described_class.parse("x&amp;y")
56
+
57
+ expect(fragment.output).to be == "x&amp;y"
58
+ end
59
+ end
60
+
@@ -0,0 +1,12 @@
1
+ <hr>
2
+ <a href="http://somegreatsite.com">Link Name</a>
3
+ is a link to another nifty site
4
+ <h1>This is a Header</h1>
5
+ <h1>This is a Medium Header</h2>
6
+ Send me mail at <a href="mailto:support@yourcompany.com">
7
+ support@yourcompany.com</a>.
8
+ <hr>
9
+ <p>This is a new paragraph!</p>
10
+ <p><b>This is a new paragraph!</b></p>
11
+ <br/><b><i>This is a new sentence without a paragraph break, in bold italics.</i></b>
12
+ <hr>
File without changes
@@ -0,0 +1,25 @@
1
+
2
+ require_relative 'lib/trenni/sanitize/version'
3
+
4
+ Gem::Specification.new do |spec|
5
+ spec.name = "trenni-sanitize"
6
+ spec.platform = Gem::Platform::RUBY
7
+ spec.version = Trenni::Sanitize::VERSION
8
+ spec.authors = ["Samuel Williams"]
9
+ spec.email = ["samuel.williams@oriontransfer.co.nz"]
10
+ spec.summary = %q{Sanitize markdown according to a set of rules.}
11
+ spec.homepage = "https://github.com/ioquatix/trenni-sanitize"
12
+
13
+ spec.files = `git ls-files`.split($/)
14
+ spec.executables = spec.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
15
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
16
+ spec.require_paths = ["lib"]
17
+
18
+ spec.required_ruby_version = '~> 2.1'
19
+
20
+ spec.add_dependency "trenni", '~> 3.5.0'
21
+
22
+ spec.add_development_dependency "bundler", "~> 1.3"
23
+ spec.add_development_dependency "rspec", "~> 3.4"
24
+ spec.add_development_dependency "rake"
25
+ end
metadata ADDED
@@ -0,0 +1,123 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: trenni-sanitize
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Samuel Williams
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2018-02-14 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: trenni
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 3.5.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 3.5.0
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.3'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.3'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.4'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.4'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ description:
70
+ email:
71
+ - samuel.williams@oriontransfer.co.nz
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - ".gitignore"
77
+ - ".rspec"
78
+ - ".simplecov"
79
+ - ".travis.yml"
80
+ - Gemfile
81
+ - README.md
82
+ - Rakefile
83
+ - lib/trenni/sanitize.rb
84
+ - lib/trenni/sanitize/extensions.rb
85
+ - lib/trenni/sanitize/filter.rb
86
+ - lib/trenni/sanitize/fragment.rb
87
+ - lib/trenni/sanitize/text.rb
88
+ - lib/trenni/sanitize/version.rb
89
+ - spec/spec_helper.rb
90
+ - spec/trenni/sanitize/benchmark_spec.rb
91
+ - spec/trenni/sanitize/fragment_spec.rb
92
+ - spec/trenni/sanitize/sample.html
93
+ - spec/trenni/sanitize/text_spec.rb
94
+ - trenni-sanitize.gemspec
95
+ homepage: https://github.com/ioquatix/trenni-sanitize
96
+ licenses: []
97
+ metadata: {}
98
+ post_install_message:
99
+ rdoc_options: []
100
+ require_paths:
101
+ - lib
102
+ required_ruby_version: !ruby/object:Gem::Requirement
103
+ requirements:
104
+ - - "~>"
105
+ - !ruby/object:Gem::Version
106
+ version: '2.1'
107
+ required_rubygems_version: !ruby/object:Gem::Requirement
108
+ requirements:
109
+ - - ">="
110
+ - !ruby/object:Gem::Version
111
+ version: '0'
112
+ requirements: []
113
+ rubyforge_project:
114
+ rubygems_version: 2.6.12
115
+ signing_key:
116
+ specification_version: 4
117
+ summary: Sanitize markdown according to a set of rules.
118
+ test_files:
119
+ - spec/spec_helper.rb
120
+ - spec/trenni/sanitize/benchmark_spec.rb
121
+ - spec/trenni/sanitize/fragment_spec.rb
122
+ - spec/trenni/sanitize/sample.html
123
+ - spec/trenni/sanitize/text_spec.rb