httpbl 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG +1 -0
- data/LICENSE +21 -0
- data/Manifest +7 -0
- data/README +147 -0
- data/Rakefile +7 -0
- data/httpbl.gemspec +34 -0
- data/lib/httpbl.rb +59 -0
- metadata +76 -0
data/CHANGELOG
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
v0.1.3. First public test release, not ready for production
|
data/LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2009 Brandon Palmen
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/Manifest
ADDED
data/README
ADDED
@@ -0,0 +1,147 @@
|
|
1
|
+
HttpBL
|
2
|
+
===========
|
3
|
+
|
4
|
+
HttpBL is drop-in IP-filtering middleware for Rails 2.3+ and other Rack-based
|
5
|
+
applications. It resolves information about each request's source IP address
|
6
|
+
from the Http:BL service at http://projecthoneypot.org, and denies access to
|
7
|
+
clients whose IP addresses are associated with suspicious behavior like impolite
|
8
|
+
crawling, comment-spamming, dictionary attacks, and email-harvesting.
|
9
|
+
|
10
|
+
* Deny access to IP addresses that are associated with suspicious
|
11
|
+
behavior which exceeds a customizable threshold.
|
12
|
+
* Expire blocked IPs that have not been associated with suspicious
|
13
|
+
behavior after a customizable period of days.
|
14
|
+
* Identify common search engines by IP address (not User-Agent), and
|
15
|
+
disallow access to a specific subset.
|
16
|
+
|
17
|
+
Installation
|
18
|
+
------------
|
19
|
+
|
20
|
+
gem install bpalmen-httpbl
|
21
|
+
|
22
|
+
Basic Usage
|
23
|
+
------------
|
24
|
+
|
25
|
+
HttpBL is Rack middleware, and can be used with any Rack-based application. First,
|
26
|
+
you must obtain an API key for the Http:BL service at http://projecthoneypot.org
|
27
|
+
|
28
|
+
To add HttpBL to your middleware stack, simply add the following to config.ru:
|
29
|
+
|
30
|
+
require 'httpbl'
|
31
|
+
|
32
|
+
use HttpBL, :api_key => "YOUR API KEY"
|
33
|
+
|
34
|
+
For Rails 2.3+ add the following to environment.rb:
|
35
|
+
|
36
|
+
require 'httpbl'
|
37
|
+
|
38
|
+
config.middleware.use HttpBL, :api_key => "YOUR API KEY"
|
39
|
+
|
40
|
+
Advanced Usage
|
41
|
+
-------------
|
42
|
+
|
43
|
+
To insert HttpBL at the top of the Rails rackstack:
|
44
|
+
(use 'rake middleware' to confirm that Rack::Lock is at the top of the stack)
|
45
|
+
|
46
|
+
config.middleware.insert_before(Rack::Lock, HttpBL, :api_key => "YOUR API KEY")
|
47
|
+
|
48
|
+
To customize HttpBL's filtering behavior, use the available options:
|
49
|
+
|
50
|
+
use HttpBL, :api_key => "YOUR API KEY",
|
51
|
+
:deny_types => [1, 2, 4],
|
52
|
+
:threat_level_threshold => 0,
|
53
|
+
:age_threshold => 5,
|
54
|
+
:blocked_search_engines => [0],
|
55
|
+
|
56
|
+
Available Options:
|
57
|
+
|
58
|
+
The following options (shown with default values) are available to
|
59
|
+
customize the particular types of suspicious activity you wish to thwart:
|
60
|
+
|
61
|
+
:deny_types => [1, 2, 4, 8, 16, 32, 64, 128]
|
62
|
+
|
63
|
+
Project Honeypot classifies suspicious behavior as belonging to
|
64
|
+
certain types, which are identified in the API's response to
|
65
|
+
each IP lookup. You can tell HttpBL to only deny certain kinds
|
66
|
+
of behavior by changing this to a subset of those possible.
|
67
|
+
|
68
|
+
As of March 2009, only types 1, 2, and 4 have been specified,
|
69
|
+
but additional types are reserved for the future and HttpBL checks
|
70
|
+
against all of the anticipated type codes by default. Thus,
|
71
|
+
there may be a very small performance advantage to setting
|
72
|
+
:deny_types => [1, 2, 4] simply to exclude checks for codes
|
73
|
+
that aren't (yet) being used; however, this will have to be
|
74
|
+
updated if more codes come into use, whereas the default
|
75
|
+
requires no further attention.
|
76
|
+
|
77
|
+
The current types are:
|
78
|
+
1: Suspicious
|
79
|
+
2: Harvester
|
80
|
+
4: Comment Spammer
|
81
|
+
|
82
|
+
:threat_level_threshold => 2
|
83
|
+
|
84
|
+
The threat level reported by Project Honeypot is based on a
|
85
|
+
logarithmic scale, approximated by:
|
86
|
+
1: 1 spam
|
87
|
+
25: 100 spam
|
88
|
+
50: 10,000 spam
|
89
|
+
100: 1,000,000 spam.
|
90
|
+
in which spam is pronounced spam even in the plural.
|
91
|
+
|
92
|
+
Choosing a threat level threshold can be tricky business if
|
93
|
+
one isn't sure how accurate the measure of threat is, since it
|
94
|
+
would be improper to block legitimate traffic by mistake. Because
|
95
|
+
the email addresses that Project Honeypot uses as spam-bait are unique,
|
96
|
+
artificial, and well-hidden, NO email should be sent to those addresses
|
97
|
+
at all, and it is fair to assume that even the low threat level
|
98
|
+
associated with just a few spam is still significant.
|
99
|
+
|
100
|
+
With that in mind, the default threshold is 2; if you want to
|
101
|
+
filter more aggressively, set :threat_level_threshold => 0
|
102
|
+
|
103
|
+
:age_threshold => 10
|
104
|
+
|
105
|
+
This sets the number of days that IP addresses that have been
|
106
|
+
associated with suspicous activity must wait to regain access after
|
107
|
+
the suspicious activity has ceased. Keeping this at a sane value will
|
108
|
+
allow IPs that are reassigned or cleaned up to expire from the blacklist.
|
109
|
+
|
110
|
+
If you want to be more aggressive (require a longer cool-off-period),
|
111
|
+
set :age_threshold => 30; if you want to let IPs back in after just a
|
112
|
+
few days, set :age_threshold => 5
|
113
|
+
|
114
|
+
:blocked_search_engines => []
|
115
|
+
|
116
|
+
Because Project Honeypot identifies search engine traffic by IP
|
117
|
+
address, this filter may be used to exclude certain robots from your
|
118
|
+
site. If one presumes that request-IPs are at least marginally more
|
119
|
+
difficult to spoof than User-Agent strings, this filter may be marginally
|
120
|
+
more effective than some other robot detection systems.
|
121
|
+
|
122
|
+
If there are particular search engines that you would like to exclude
|
123
|
+
from your site, set :blocked_search_engines => [0, ... ] where the codes
|
124
|
+
defined by http://projecthoneypot.org/httpbl_api are:
|
125
|
+
|
126
|
+
0: Misc
|
127
|
+
1: AltaVista
|
128
|
+
2: Ask
|
129
|
+
3: Baidu
|
130
|
+
4: Excite
|
131
|
+
5: Google
|
132
|
+
6: Looksmart
|
133
|
+
7: Lycos
|
134
|
+
8: MSN
|
135
|
+
9: Yahoo
|
136
|
+
10: Cuil
|
137
|
+
11: InfoSeek
|
138
|
+
|
139
|
+
:dns_timeout => 0.5
|
140
|
+
|
141
|
+
DNS requests to the Http:BL service should NEVER take this long, but if
|
142
|
+
they do, you can modify this setting to prevent the application from
|
143
|
+
hanging until a system default timeout. Of course, setting this timeout
|
144
|
+
too low will essentially disable the filter (but 0 is a bad idea), if responses
|
145
|
+
can't be returned from the API before the request is permitted, by default.
|
146
|
+
Best not to mess with it unless you know what you're doing - it's a safety
|
147
|
+
mechanism.
|
data/Rakefile
ADDED
data/httpbl.gemspec
ADDED
@@ -0,0 +1,34 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
Gem::Specification.new do |s|
|
4
|
+
s.name = %q{httpbl}
|
5
|
+
s.version = "0.1.3"
|
6
|
+
|
7
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
|
8
|
+
s.authors = ["Brandon Palmen"]
|
9
|
+
s.date = %q{2009-03-22}
|
10
|
+
s.description = %q{A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.}
|
11
|
+
s.email = %q{}
|
12
|
+
s.extra_rdoc_files = ["README", "lib/httpbl.rb", "CHANGELOG", "LICENSE"]
|
13
|
+
s.files = ["README", "lib/httpbl.rb", "Rakefile", "httpbl.gemspec", "CHANGELOG", "LICENSE", "Manifest"]
|
14
|
+
s.has_rdoc = true
|
15
|
+
s.homepage = %q{http://github.com/bpalmen/httpbl}
|
16
|
+
s.rdoc_options = ["--line-numbers", "--inline-source", "--title", "Httpbl", "--main", "README"]
|
17
|
+
s.require_paths = ["lib"]
|
18
|
+
s.rubyforge_project = %q{httpbl}
|
19
|
+
s.rubygems_version = %q{1.3.1}
|
20
|
+
s.summary = %q{A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.}
|
21
|
+
|
22
|
+
if s.respond_to? :specification_version then
|
23
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
24
|
+
s.specification_version = 2
|
25
|
+
|
26
|
+
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
27
|
+
s.add_runtime_dependency(%q<rack>, [">= 0"])
|
28
|
+
else
|
29
|
+
s.add_dependency(%q<rack>, [">= 0"])
|
30
|
+
end
|
31
|
+
else
|
32
|
+
s.add_dependency(%q<rack>, [">= 0"])
|
33
|
+
end
|
34
|
+
end
|
data/lib/httpbl.rb
ADDED
@@ -0,0 +1,59 @@
|
|
1
|
+
# The Httpbl middleware
|
2
|
+
|
3
|
+
class HttpBL
|
4
|
+
autoload :Resolv, 'resolv'
|
5
|
+
|
6
|
+
def initialize(app, options = {})
|
7
|
+
@app = app
|
8
|
+
@options = {:blocked_search_engines => [],
|
9
|
+
:age_threshold => 10,
|
10
|
+
:threat_level_threshold => 2,
|
11
|
+
# 8..128 aren't used as of 3/2009, but might be used in the future
|
12
|
+
:deny_types => [1, 2, 4, 8, 16, 32, 64, 128],
|
13
|
+
# DONT set this to 0
|
14
|
+
:dns_timeout => 0.5
|
15
|
+
}.merge(options)
|
16
|
+
raise "Missing :api_key for Http:BL middleware" unless @options[:api_key]
|
17
|
+
end
|
18
|
+
|
19
|
+
def call(env)
|
20
|
+
dup._call(env)
|
21
|
+
end
|
22
|
+
|
23
|
+
def _call(env)
|
24
|
+
request = Rack::Request.new(env)
|
25
|
+
bl_status = resolve(request.ip)
|
26
|
+
if bl_status and blocked?(bl_status)
|
27
|
+
[403, {"Content-Type" => "text/html"}, "<h1>403 Forbidden</h1> Request IP is listed as suspicious by <a href='http://projecthoneypot.org/ip_#{request.ip}'>Project Honeypot</a>"]
|
28
|
+
else
|
29
|
+
@app.call(env)
|
30
|
+
end
|
31
|
+
|
32
|
+
end
|
33
|
+
|
34
|
+
def resolve(ip)
|
35
|
+
query = @options[:api_key] + '.' + ip.split('.').reverse.join('.') + '.dnsbl.httpbl.org'
|
36
|
+
Timeout::timeout(@options[:dns_timeout]) do
|
37
|
+
Resolv::DNS.new.getaddress(query).to_s rescue nil
|
38
|
+
end
|
39
|
+
rescue Timeout::Error, Errno::ECONNREFUSED
|
40
|
+
end
|
41
|
+
|
42
|
+
def blocked?(response)
|
43
|
+
response = response.split('.').collect!(&:to_i)
|
44
|
+
if response[0] == 127
|
45
|
+
if response[3] == 0
|
46
|
+
@blocked = true if @options[:blocked_search_engines].include? response[2]
|
47
|
+
else
|
48
|
+
@age = true if response[1] < @options[:age_threshold]
|
49
|
+
@threat = true if response[2] > @options[:threat_level_threshold]
|
50
|
+
@options[:deny_types].each do |key|
|
51
|
+
@deny = true if response[3] & key == key
|
52
|
+
end
|
53
|
+
@blocked = true if @deny and @threat and @age
|
54
|
+
end
|
55
|
+
end
|
56
|
+
return @blocked
|
57
|
+
end
|
58
|
+
|
59
|
+
end
|
metadata
ADDED
@@ -0,0 +1,76 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: httpbl
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.3
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Brandon Palmen
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-03-22 00:00:00 -04:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: rack
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0"
|
24
|
+
version:
|
25
|
+
description: A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.
|
26
|
+
email: ""
|
27
|
+
executables: []
|
28
|
+
|
29
|
+
extensions: []
|
30
|
+
|
31
|
+
extra_rdoc_files:
|
32
|
+
- README
|
33
|
+
- lib/httpbl.rb
|
34
|
+
- CHANGELOG
|
35
|
+
- LICENSE
|
36
|
+
files:
|
37
|
+
- README
|
38
|
+
- lib/httpbl.rb
|
39
|
+
- Rakefile
|
40
|
+
- httpbl.gemspec
|
41
|
+
- CHANGELOG
|
42
|
+
- LICENSE
|
43
|
+
- Manifest
|
44
|
+
has_rdoc: true
|
45
|
+
homepage: http://github.com/bpalmen/httpbl
|
46
|
+
post_install_message:
|
47
|
+
rdoc_options:
|
48
|
+
- --line-numbers
|
49
|
+
- --inline-source
|
50
|
+
- --title
|
51
|
+
- Httpbl
|
52
|
+
- --main
|
53
|
+
- README
|
54
|
+
require_paths:
|
55
|
+
- lib
|
56
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
57
|
+
requirements:
|
58
|
+
- - ">="
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: "0"
|
61
|
+
version:
|
62
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
63
|
+
requirements:
|
64
|
+
- - ">="
|
65
|
+
- !ruby/object:Gem::Version
|
66
|
+
version: "1.2"
|
67
|
+
version:
|
68
|
+
requirements: []
|
69
|
+
|
70
|
+
rubyforge_project: httpbl
|
71
|
+
rubygems_version: 1.3.1
|
72
|
+
signing_key:
|
73
|
+
specification_version: 2
|
74
|
+
summary: A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.
|
75
|
+
test_files: []
|
76
|
+
|