httpbl 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (8) hide show
  1. data/CHANGELOG +1 -0
  2. data/LICENSE +21 -0
  3. data/Manifest +7 -0
  4. data/README +147 -0
  5. data/Rakefile +7 -0
  6. data/httpbl.gemspec +34 -0
  7. data/lib/httpbl.rb +59 -0
  8. metadata +76 -0
data/CHANGELOG ADDED
@@ -0,0 +1 @@
1
+ v0.1.3. First public test release, not ready for production
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License
2
+
3
+ Copyright (c) 2009 Brandon Palmen
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/Manifest ADDED
@@ -0,0 +1,7 @@
1
+ README
2
+ lib/httpbl.rb
3
+ Rakefile
4
+ httpbl.gemspec
5
+ CHANGELOG
6
+ LICENSE
7
+ Manifest
data/README ADDED
@@ -0,0 +1,147 @@
1
+ HttpBL
2
+ ===========
3
+
4
+ HttpBL is drop-in IP-filtering middleware for Rails 2.3+ and other Rack-based
5
+ applications. It resolves information about each request's source IP address
6
+ from the Http:BL service at http://projecthoneypot.org, and denies access to
7
+ clients whose IP addresses are associated with suspicious behavior like impolite
8
+ crawling, comment-spamming, dictionary attacks, and email-harvesting.
9
+
10
+ * Deny access to IP addresses that are associated with suspicious
11
+ behavior which exceeds a customizable threshold.
12
+ * Expire blocked IPs that have not been associated with suspicious
13
+ behavior after a customizable period of days.
14
+ * Identify common search engines by IP address (not User-Agent), and
15
+ disallow access to a specific subset.
16
+
17
+ Installation
18
+ ------------
19
+
20
+ gem install bpalmen-httpbl
21
+
22
+ Basic Usage
23
+ ------------
24
+
25
+ HttpBL is Rack middleware, and can be used with any Rack-based application. First,
26
+ you must obtain an API key for the Http:BL service at http://projecthoneypot.org
27
+
28
+ To add HttpBL to your middleware stack, simply add the following to config.ru:
29
+
30
+ require 'httpbl'
31
+
32
+ use HttpBL, :api_key => "YOUR API KEY"
33
+
34
+ For Rails 2.3+ add the following to environment.rb:
35
+
36
+ require 'httpbl'
37
+
38
+ config.middleware.use HttpBL, :api_key => "YOUR API KEY"
39
+
40
+ Advanced Usage
41
+ -------------
42
+
43
+ To insert HttpBL at the top of the Rails rackstack:
44
+ (use 'rake middleware' to confirm that Rack::Lock is at the top of the stack)
45
+
46
+ config.middleware.insert_before(Rack::Lock, HttpBL, :api_key => "YOUR API KEY")
47
+
48
+ To customize HttpBL's filtering behavior, use the available options:
49
+
50
+ use HttpBL, :api_key => "YOUR API KEY",
51
+ :deny_types => [1, 2, 4],
52
+ :threat_level_threshold => 0,
53
+ :age_threshold => 5,
54
+ :blocked_search_engines => [0],
55
+
56
+ Available Options:
57
+
58
+ The following options (shown with default values) are available to
59
+ customize the particular types of suspicious activity you wish to thwart:
60
+
61
+ :deny_types => [1, 2, 4, 8, 16, 32, 64, 128]
62
+
63
+ Project Honeypot classifies suspicious behavior as belonging to
64
+ certain types, which are identified in the API's response to
65
+ each IP lookup. You can tell HttpBL to only deny certain kinds
66
+ of behavior by changing this to a subset of those possible.
67
+
68
+ As of March 2009, only types 1, 2, and 4 have been specified,
69
+ but additional types are reserved for the future and HttpBL checks
70
+ against all of the anticipated type codes by default. Thus,
71
+ there may be a very small performance advantage to setting
72
+ :deny_types => [1, 2, 4] simply to exclude checks for codes
73
+ that aren't (yet) being used; however, this will have to be
74
+ updated if more codes come into use, whereas the default
75
+ requires no further attention.
76
+
77
+ The current types are:
78
+ 1: Suspicious
79
+ 2: Harvester
80
+ 4: Comment Spammer
81
+
82
+ :threat_level_threshold => 2
83
+
84
+ The threat level reported by Project Honeypot is based on a
85
+ logarithmic scale, approximated by:
86
+ 1: 1 spam
87
+ 25: 100 spam
88
+ 50: 10,000 spam
89
+ 100: 1,000,000 spam.
90
+ in which spam is pronounced spam even in the plural.
91
+
92
+ Choosing a threat level threshold can be tricky business if
93
+ one isn't sure how accurate the measure of threat is, since it
94
+ would be improper to block legitimate traffic by mistake. Because
95
+ the email addresses that Project Honeypot uses as spam-bait are unique,
96
+ artificial, and well-hidden, NO email should be sent to those addresses
97
+ at all, and it is fair to assume that even the low threat level
98
+ associated with just a few spam is still significant.
99
+
100
+ With that in mind, the default threshold is 2; if you want to
101
+ filter more aggressively, set :threat_level_threshold => 0
102
+
103
+ :age_threshold => 10
104
+
105
+ This sets the number of days that IP addresses that have been
106
+ associated with suspicous activity must wait to regain access after
107
+ the suspicious activity has ceased. Keeping this at a sane value will
108
+ allow IPs that are reassigned or cleaned up to expire from the blacklist.
109
+
110
+ If you want to be more aggressive (require a longer cool-off-period),
111
+ set :age_threshold => 30; if you want to let IPs back in after just a
112
+ few days, set :age_threshold => 5
113
+
114
+ :blocked_search_engines => []
115
+
116
+ Because Project Honeypot identifies search engine traffic by IP
117
+ address, this filter may be used to exclude certain robots from your
118
+ site. If one presumes that request-IPs are at least marginally more
119
+ difficult to spoof than User-Agent strings, this filter may be marginally
120
+ more effective than some other robot detection systems.
121
+
122
+ If there are particular search engines that you would like to exclude
123
+ from your site, set :blocked_search_engines => [0, ... ] where the codes
124
+ defined by http://projecthoneypot.org/httpbl_api are:
125
+
126
+ 0: Misc
127
+ 1: AltaVista
128
+ 2: Ask
129
+ 3: Baidu
130
+ 4: Excite
131
+ 5: Google
132
+ 6: Looksmart
133
+ 7: Lycos
134
+ 8: MSN
135
+ 9: Yahoo
136
+ 10: Cuil
137
+ 11: InfoSeek
138
+
139
+ :dns_timeout => 0.5
140
+
141
+ DNS requests to the Http:BL service should NEVER take this long, but if
142
+ they do, you can modify this setting to prevent the application from
143
+ hanging until a system default timeout. Of course, setting this timeout
144
+ too low will essentially disable the filter (but 0 is a bad idea), if responses
145
+ can't be returned from the API before the request is permitted, by default.
146
+ Best not to mess with it unless you know what you're doing - it's a safety
147
+ mechanism.
data/Rakefile ADDED
@@ -0,0 +1,7 @@
1
+ require 'echoe'
2
+ Echoe.new('httpbl') do |p|
3
+ p.author = "Brandon Palmen"
4
+ p.summary = "A Rack middleware IP filter that uses Http:BL to exclude suspicious robots."
5
+ p.url = "http://github.com/bpalmen/httpbl"
6
+ p.runtime_dependencies = ["rack"]
7
+ end
data/httpbl.gemspec ADDED
@@ -0,0 +1,34 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = %q{httpbl}
5
+ s.version = "0.1.3"
6
+
7
+ s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
8
+ s.authors = ["Brandon Palmen"]
9
+ s.date = %q{2009-03-22}
10
+ s.description = %q{A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.}
11
+ s.email = %q{}
12
+ s.extra_rdoc_files = ["README", "lib/httpbl.rb", "CHANGELOG", "LICENSE"]
13
+ s.files = ["README", "lib/httpbl.rb", "Rakefile", "httpbl.gemspec", "CHANGELOG", "LICENSE", "Manifest"]
14
+ s.has_rdoc = true
15
+ s.homepage = %q{http://github.com/bpalmen/httpbl}
16
+ s.rdoc_options = ["--line-numbers", "--inline-source", "--title", "Httpbl", "--main", "README"]
17
+ s.require_paths = ["lib"]
18
+ s.rubyforge_project = %q{httpbl}
19
+ s.rubygems_version = %q{1.3.1}
20
+ s.summary = %q{A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.}
21
+
22
+ if s.respond_to? :specification_version then
23
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
24
+ s.specification_version = 2
25
+
26
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
27
+ s.add_runtime_dependency(%q<rack>, [">= 0"])
28
+ else
29
+ s.add_dependency(%q<rack>, [">= 0"])
30
+ end
31
+ else
32
+ s.add_dependency(%q<rack>, [">= 0"])
33
+ end
34
+ end
data/lib/httpbl.rb ADDED
@@ -0,0 +1,59 @@
1
+ # The Httpbl middleware
2
+
3
+ class HttpBL
4
+ autoload :Resolv, 'resolv'
5
+
6
+ def initialize(app, options = {})
7
+ @app = app
8
+ @options = {:blocked_search_engines => [],
9
+ :age_threshold => 10,
10
+ :threat_level_threshold => 2,
11
+ # 8..128 aren't used as of 3/2009, but might be used in the future
12
+ :deny_types => [1, 2, 4, 8, 16, 32, 64, 128],
13
+ # DONT set this to 0
14
+ :dns_timeout => 0.5
15
+ }.merge(options)
16
+ raise "Missing :api_key for Http:BL middleware" unless @options[:api_key]
17
+ end
18
+
19
+ def call(env)
20
+ dup._call(env)
21
+ end
22
+
23
+ def _call(env)
24
+ request = Rack::Request.new(env)
25
+ bl_status = resolve(request.ip)
26
+ if bl_status and blocked?(bl_status)
27
+ [403, {"Content-Type" => "text/html"}, "<h1>403 Forbidden</h1> Request IP is listed as suspicious by <a href='http://projecthoneypot.org/ip_#{request.ip}'>Project Honeypot</a>"]
28
+ else
29
+ @app.call(env)
30
+ end
31
+
32
+ end
33
+
34
+ def resolve(ip)
35
+ query = @options[:api_key] + '.' + ip.split('.').reverse.join('.') + '.dnsbl.httpbl.org'
36
+ Timeout::timeout(@options[:dns_timeout]) do
37
+ Resolv::DNS.new.getaddress(query).to_s rescue nil
38
+ end
39
+ rescue Timeout::Error, Errno::ECONNREFUSED
40
+ end
41
+
42
+ def blocked?(response)
43
+ response = response.split('.').collect!(&:to_i)
44
+ if response[0] == 127
45
+ if response[3] == 0
46
+ @blocked = true if @options[:blocked_search_engines].include? response[2]
47
+ else
48
+ @age = true if response[1] < @options[:age_threshold]
49
+ @threat = true if response[2] > @options[:threat_level_threshold]
50
+ @options[:deny_types].each do |key|
51
+ @deny = true if response[3] & key == key
52
+ end
53
+ @blocked = true if @deny and @threat and @age
54
+ end
55
+ end
56
+ return @blocked
57
+ end
58
+
59
+ end
metadata ADDED
@@ -0,0 +1,76 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: httpbl
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.3
5
+ platform: ruby
6
+ authors:
7
+ - Brandon Palmen
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-03-22 00:00:00 -04:00
13
+ default_executable:
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: rack
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: "0"
24
+ version:
25
+ description: A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.
26
+ email: ""
27
+ executables: []
28
+
29
+ extensions: []
30
+
31
+ extra_rdoc_files:
32
+ - README
33
+ - lib/httpbl.rb
34
+ - CHANGELOG
35
+ - LICENSE
36
+ files:
37
+ - README
38
+ - lib/httpbl.rb
39
+ - Rakefile
40
+ - httpbl.gemspec
41
+ - CHANGELOG
42
+ - LICENSE
43
+ - Manifest
44
+ has_rdoc: true
45
+ homepage: http://github.com/bpalmen/httpbl
46
+ post_install_message:
47
+ rdoc_options:
48
+ - --line-numbers
49
+ - --inline-source
50
+ - --title
51
+ - Httpbl
52
+ - --main
53
+ - README
54
+ require_paths:
55
+ - lib
56
+ required_ruby_version: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: "0"
61
+ version:
62
+ required_rubygems_version: !ruby/object:Gem::Requirement
63
+ requirements:
64
+ - - ">="
65
+ - !ruby/object:Gem::Version
66
+ version: "1.2"
67
+ version:
68
+ requirements: []
69
+
70
+ rubyforge_project: httpbl
71
+ rubygems_version: 1.3.1
72
+ signing_key:
73
+ specification_version: 2
74
+ summary: A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.
75
+ test_files: []
76
+