defender 0.2.0 → 1.0.0beta1

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,132 @@
1
+ Defender
2
+ ========
3
+
4
+ Defender is a wrapper for the [Defensio][0] spam filtering API. From
5
+ their own site:
6
+
7
+ > More than just another spam filter, Defensio also eliminates malware
8
+ > and other unwanted or risky content to fully protect your blog or Web
9
+ > 2.0 application.
10
+
11
+ Defensio is able to not only find spam, but also filter profanity and
12
+ other similarities. It can also see the difference between malicious
13
+ material and spammy material.
14
+
15
+
16
+ Overview
17
+ --------
18
+
19
+ With Defender you can submit documents to Defensio, which will look for
20
+ spam and malicious content in the documents.
21
+
22
+ A document contains content to be analyzed by Defensio, or that has been
23
+ analyzed.
24
+
25
+ Submitting documents to Defensio is really easy. Here's a barebones
26
+ example:
27
+
28
+ require 'defender'
29
+ document = Defender::Document.new
30
+ document.data[:content] = 'Hello World!'
31
+ document.data[:type] = 'comment'
32
+ document.data[:platform] = 'defender'
33
+ document.save
34
+
35
+ The `document.data` hash can contain a lot of data. The ones you see
36
+ here are the only required ones, but you should submit as much data as
37
+ you can. Look at the [Defensio API docs][3] for information on the
38
+ different data you can submit. Oh, and the keys can be symbols, and you
39
+ can use underscores instead of dashes.
40
+
41
+ After saving the document, Defender will set the `document.allow?`,
42
+ `document.spaminess` and
43
+ `document.signature` attributes. The first one tells you if you should
44
+ display the document or not on your website. The second is a float which
45
+ tells you just how spammy the document is. This could be useful for
46
+ sorting the documents in an admin panel. The lower the spaminess is, the
47
+ less chance is it for it being spam. The last attribute is an unique
48
+ identifier you should save with your document in the database. This can
49
+ be used for retrieving the status of your document again, and for
50
+ retraining purposes.
51
+
52
+ Did I say retraining? Oh yes, you can retrain Defensio! If some spam
53
+ went through the filters, or some legit documents were marked as spam,
54
+ tell Defensio by setting the `document.allow` attribute and save the
55
+ document again:
56
+
57
+ document.allow = true
58
+ document.save
59
+
60
+ This tells Defensio that the document should've been allowed. Don't have
61
+ access to the `document` instance any more you say? No problem, just
62
+ retrieve it again using the signature. You did save the signature,
63
+ didn't you?
64
+
65
+ document = Defender::Document.find(signature)
66
+
67
+
68
+ Development
69
+ -----------
70
+
71
+ Want to help out on Defender?
72
+
73
+ First, you should clone the repo and run the features and specs:
74
+
75
+ git clone git://github.com/dvyjones/defender.git
76
+ cd defender
77
+ rake features
78
+ rake spec
79
+
80
+ Feel free to ping the mailing list if you have any problems and we'll
81
+ try to sort it out.
82
+
83
+
84
+ Contributing
85
+ ------------
86
+
87
+ Once you've made your great commits:
88
+
89
+ 1. [Fork][1] defender
90
+ 2. Create a topic branch - `git checkout -b my_branch`
91
+ 3. Push to your branch - `git push origin my_branch`
92
+ 4. Create an [Issue][2] with a link to your branch
93
+ 5. That's it!
94
+
95
+ You might want to checkout our [Contributing][cb] wiki page for
96
+ information on coding standards, new features, etc.
97
+
98
+
99
+ Mailing List
100
+ ------------
101
+
102
+ To join the list simply send an email to <defender@librelist.com>. This
103
+ will subscribe you and send you information about your subscription,
104
+ include unsubscribe information.
105
+
106
+ The archive can be found at <http://librelist.com/browser/>.
107
+
108
+
109
+ Meta
110
+ ----
111
+
112
+ * Code: `git clone git://github.com/dvyjones/defender.git`
113
+ * Home: <http://github.com/dvyjones/defender/>
114
+ * Docs: <http://yardoc.org/docs/dvyjones-defender/>
115
+ * Bugs: <http://github.com/dvyjones/defender/issues>
116
+ * List: <defender@librelist.com>
117
+ * Gems: <http://gemcutter.org/gems/defender>
118
+
119
+ This project uses [Semantic Versioning][sv].
120
+
121
+
122
+ Author
123
+ ------
124
+
125
+ Henrik Hodne :: dvyjones@binaryhex.com :: @dvyjones
126
+
127
+ [0]: http://defensio.com
128
+ [1]: http://help.github.com/forking/
129
+ [2]: http://github.com/dvyjones/defender/issues
130
+ [3]: http://defensio.com/api
131
+ [sv]: http://semver.org
132
+ [cb]: http://wiki.github.com/dvyjones/defender/contributing
data/lib/defender.rb CHANGED
@@ -1,67 +1,24 @@
1
- require 'httparty'
2
-
1
+ require 'defender/version'
3
2
  require 'defender/document'
4
- require 'defender/statistics'
5
3
 
6
4
  module Defender
7
- VERSION = "0.2.0"
8
-
9
- include HTTParty
10
-
11
- # The Defensio API version currently supported by Defender
12
- API_VERSION = "2.0"
13
-
14
- # HTTParty config
15
- format :json
16
- base_uri "api.defensio.com/#{API_VERSION}/users"
17
-
18
- class << self
19
- ##
20
- # Your Defensio API key. You need to register at defensio.com to get a key.
21
- attr_accessor :api_key
22
-
23
- ##
24
- # The URL that will be called when Defensio is done analyzing a comment with
25
- # asynchronous callbacks. You should be able to pass the request parameters
26
- # straight into {Document#set_attributes}. The signature will be in the
27
- # `signature` parameter.
28
- #
29
- # *IMPORTANT*: Defensio will NOT retry unsuccessful callbacks to your
30
- # server. If you do not see a POST originating from Defensio after 5
31
- # minutes, call {Document#refresh!} on the document to obtain the analysis
32
- # result.
33
- #
34
- # Occasionally, Defensio may perform more than one POST request to your
35
- # server for the same document. For example, if new evidence indicates that
36
- # a document is unwanted, even though it was originally identified as
37
- # legitimate, Defensio might notify you that the classification has changed.
38
- #
39
- # If you do not provide this and use asynchronous calling, you need to call
40
- # {Document#refresh!} to get the analysis result.
41
- #
42
- # You can debug callbacks using http://postbin.org. See the Defensio API
43
- # documents for the format of the requests.
44
- #
45
- # @return [String]
46
- attr_accessor :async_callback
5
+ ##
6
+ # You most probably don't need to set this. It is used to replace the backend
7
+ # when running the tests. If you for any reason need to use another backend
8
+ # than the defensio gem, set this. The object needs to respond to the same
9
+ # methods as the {Defensio} object does.
10
+ #
11
+ # @param [Defensio] defensio The Defensio backend
12
+ def self.defensio=(defensio)
13
+ @defensio = defensio
47
14
  end
48
15
 
49
16
  ##
50
- # Determines if the given API key is valid or not. This should only be used
51
- # when configuring the client and prior to every content analysis (Document
52
- # POST).
53
- #
54
- # Set the API key using {Defender.api_key}.
55
- #
56
- # @return [Boolean] Whether the API key was valid or not.
57
- def self.check_api_key
58
- key = Defender.api_key
59
- return false unless key
60
- resp = get("/#{key}.json")['defensio-result']
61
- if resp['status'] == 'success'
62
- return true
63
- else
64
- return false
65
- end
17
+ # The Defensio backend. If no backend has been set yet, this will create one
18
+ # with the api key set with {Defender.api_key}.
19
+ def self.defensio
20
+ return @defensio if defined?(@defensio)
21
+ require 'defensio'
22
+ @defensio ||= Defensio.new(Defender.api_key, "Defender | #{VERSION} | Henrik Hodne | dvyjones@binaryhex.com")
66
23
  end
67
24
  end
@@ -1,422 +1,88 @@
1
1
  module Defender
2
- ##
3
- # A document contains content to be analyzed by Defensio, or that has been
4
- # analyzed.
5
- #
6
- # Most of the Defensio API revolves around documents, including the detection
7
- # of unwanted content.
8
2
  class Document
9
3
  ##
10
- # Whether the document should be published by your Web site or not. For
11
- # example, spam and malicious content are not allowed.
4
+ # Whether the document should be published on your Web site or not.
12
5
  #
13
- # This is the only attribute that can be updated after the initial saving.
14
- # Use this for retraining purposes.
6
+ # For example, spam and malicious content are not allowed.
15
7
  #
16
8
  # @return [Boolean]
17
9
  attr_accessor :allow
18
- alias :allow? :allow
10
+ alias_method :allow?, :allow
19
11
 
20
12
  ##
21
- # The type of content in the document.
13
+ # The information about the document. This hash accepts so many parameters
14
+ # I won't list them here. Go look at the [Defensio API docs]
15
+ # (http://defensio.com/api) instead.
22
16
  #
23
- # @return [String] The possible values are innocent, spam and malicious.
24
- attr_reader :classification
25
-
26
- ##
27
- # Whether the document matches profanity or other words defined by the
28
- # user. For example, this is useful to detect obscene comments posted
29
- # to your Web site. When true, you can obtain a filtered version of the
30
- # document by calling {#filter!}.
31
- #
32
- # @return [Boolean]
33
- attr_reader :profane
34
- alias :profane? :profane
35
-
36
- ##
37
- # A unique identifier for the document. You need this value to perform new
38
- # requests on the same document. Signatures should be kept private and never
39
- # be shared with your users.
40
- #
41
- # @return [String]
42
- attr_reader :signature
43
-
44
- ##
45
- # A numeric value indicating how strongly the document resembles spam. For
46
- # example, a document containing many links to pharmaceutical sites is
47
- # likely to have a very high spaminess value. This value should only be used
48
- # for sorting, and should never be used to determine if a document should be
49
- # allowed or not. Spaminess should be kept private and never be shared with
50
- # your users.
51
- #
52
- # @return [Float<0..1>] A float value between 0 and 1, whith 1 being
53
- # extremely spammy. For example, 0.89 (89%).
54
- attr_reader :spaminess
55
-
56
- ##
57
- # The string containing the body of the document. This field is required.
58
- #
59
- # @return [String]
60
- attr_accessor :content
61
-
62
- ##
63
- # The platform which the document is submitted on.
64
- #
65
- # One word, lower case. Spaces should be converted to underscores.
66
- #
67
- # *Examples:*
68
- # wordpress, pixelpost, drupal, phpbb, movable_type
69
- #
70
- # The default is 'ruby'.
71
- #
72
- # @return [String]
73
- attr_accessor :platform
74
-
75
- ##
76
- # Identified the type of content to be analyzed.
77
- #
78
- # Use *test* only for testing purposes.
79
- #
80
- # When *type* is set to *test*, Defensio (not Defender) parses content for
81
- # classification and spaminess. For example, if you want the API to return
82
- # *malicious* as the classification and a spaminess of *0.99*, insert the
83
- # following in content:
84
- # [malicious,0.99]
85
- #
86
- # There are three possible classifications:
87
- #
88
- # * innocent
89
- # * spam
90
- # * malicious
91
- #
92
- # Spaminess should be a decimal value between 0 and 1 (see
93
- # {#spaminess})
94
- #
95
- # *IMPORTANT*
96
- #
97
- # Do *NOT* leave type set to *test* in production. This could represent a
98
- # significant security breach.
99
- attr_accessor :type
100
-
101
- ##
102
- # The email address of the author of the document.
17
+ # Defender will replace all underscores in keys with dashes, so you can use
18
+ # `:author_email` instead of `'author-email'`.
103
19
  #
104
- # @return [String]
105
- attr_accessor :author_email
106
-
107
- ##
108
- # The IP address of the author of the document.
109
- #
110
- # For example, this could be the IP address of the person posting a comment
111
- # on a blog.
112
- #
113
- # @return [String]
114
- attr_accessor :author_ip
115
-
116
- ##
117
- # Whether or not the user posting the document is logged in onto your Web
118
- # site, either through your own authentication mechanism or through OpenID.
119
- #
120
- # @see Document#author_openid
121
- # @see Document#author_trusted
122
- # @return [Boolean]
123
- attr_accessor :author_logged_in
124
-
125
- ##
126
- # The name of the author of the document.
127
- #
128
- # @return [Boolean]
129
- attr_accessor :author_name
130
-
131
- ##
132
- # The OpenID URL of the logged-on user. Must be used in conjunction with
133
- # {Document#author_logged_in} = true.
134
- #
135
- # OpenID authentication must be taken care of by your application. Only send
136
- # this parameter if you have successfully authenticated the user with
137
- # OpenID.
138
- #
139
- # @return [String]
140
- attr_accessor :author_openid
20
+ # @return [Hash{#to_s => #to_s}]
21
+ attr_accessor :data
141
22
 
142
23
  ##
143
- # Whether or not the user is an administrator, moderator or editor of your
144
- # Web site. Pass `true` only if you can guarantee that the user has been
145
- # authenticated, has a role of responsibility, and can be trusted as a good
146
- # Web citizen.
24
+ # A unique identifier for the document.
147
25
  #
148
- # @return [Boolean]
149
- attr_accessor :author_trusted
150
-
151
- ##
152
- # The URL of the person posting the document.
26
+ # This is needed to retrieve the status back from Defensio and to submit
27
+ # false negatives/positives to Defensio. Signatures should be kept private
28
+ # and never shared with your users.
153
29
  #
154
30
  # @return [String]
155
- attr_accessor :author_url
156
-
157
- ##
158
- # Whether or not the Web browser used to post the document (i.e., the
159
- # comment) has cookies enabled. If no such detection has been made, leave
160
- # this value empty.
161
- #
162
- # @return [Boolean]
163
- attr_accessor :browser_cookies
164
-
165
- ##
166
- # Whether or not the Web browser used to post the document (i.e., the
167
- # comment) has JavaScript enabled. If no such detection has been made, leave
168
- # this value empty.
169
- #
170
- # @return [Boolean]
171
- attr_accessor :browser_javascript
172
-
173
- ##
174
- # The URL of the document being posted.
175
- #
176
- # *Examples*
177
- #
178
- # For a comment on a blog, the permalink URL might be:
179
- #
180
- # 'http://yourdomain.com/article#comment-51'
181
- #
182
- # For an article, it might be:
183
- #
184
- # 'http://yourdomain.com/article'
185
- #
186
- # @return [String]
187
- attr_accessor :document_permalink
188
-
189
- ##
190
- # Contains the HTTP headers sent with the request. You can send a few values
191
- # or all values. Because this information helps Defensio determine if a
192
- # document is innocent or not, the more headers you send, the better.
193
- #
194
- # @see #referrer
195
- # @return [Hash{String => String}, Array<String>] You can pass a hash with
196
- # key => values, or an array where each entry has the format `"HEADER:
197
- # value"`
198
- attr_accessor :http_headers
199
-
200
- ##
201
- # The date the parent document was posted. For example, on a blog, this
202
- # would be the date the article related to the comment (document) was
203
- # posted.
204
- #
205
- # If you are using threaded comments, send the date the article was posted,
206
- # *not* the date the parent comment was posted.
207
- #
208
- # @return [Time, Date, DateTime, "yyyy-mm-dd"] If a Time or DateTime is passed, only the
209
- # date part will be saved.
210
- attr_accessor :parent_document_date
211
-
212
- ##
213
- # The URL of the parent document. For example, on a blog, this would be the
214
- # URL of the article on which the comment (document) was posted.
215
- #
216
- # @see #document_permalink
217
- # @return [String]
218
- attr_accessor :parent_document_permalink
219
-
220
- ##
221
- # Provide the value of the HTTP_REFERER (note the spelling) in this field.
222
- #
223
- # @see #http_headers
224
- # @return [String]
225
- attr_accessor :referrer
226
-
227
- ##
228
- # Provide the title of the document being sent. For example, this might be
229
- # the title of a blog article.
230
- #
231
- # Do not send this information if no title has been provided.
232
- attr_accessor :title
233
-
234
- ##
235
- # Is the document still pending?
236
- #
237
- # @return [Boolean]
238
- attr_reader :pending
239
- alias :pending? :pending
240
-
241
- ##
242
- # Set the pending attribute to true. Only to be used by {find} and similar
243
- # methods.
244
- #
245
- # @private
246
- def pending!; @pending = true; end
31
+ attr_reader :signature
247
32
 
248
33
  ##
249
- # Retrieves a document from the Defensio server.
34
+ # Retrieves the status of a document back from Defensio.
250
35
  #
251
- # This can be called up to 30 days after the initial posting of a document
252
- # to Defensio.
36
+ # Please note that this only retrieves the status of the document (like
37
+ # it's spaminess, whether it should be allowed or not, etc.) and not the
38
+ # content of the request (all of the data in the {#data} hash).
253
39
  #
254
- # @return [Document]
40
+ # @param [String] signature The signature of the document to retrieve
41
+ # @return [Document] The document to retrieve
255
42
  def self.find(signature)
256
- document = new()
257
- response = Defender.get("/#{Defender.api_key}/documents/#{signature}.json")['defensio-result']
258
- if response['status'] == 'success' || response['status'] == 'pending'
259
- document.set_attributes(response)
260
- document.pending! if response['status'] == 'pending'
261
- else
262
- raise StandardError, response['message']
263
- end
264
- document
265
- end
43
+ document = new
44
+ _code, data = Defender.defensio.get_document(signature)
45
+ document.instance_variable_set(:@saved, true)
46
+ document.instance_variable_set(:@allow, data['allow'])
47
+ document.instance_variable_set(:@signature, signature)
266
48
 
267
- ##
268
- # Create a new document.
269
- def initialize()
49
+ document
270
50
  end
271
51
 
272
52
  ##
273
- # Re-retrieves the document from the Defensio server
274
- #
275
- # This can be called up to 30 days after the initial posting of the document
276
- # to Defensio
277
- #
278
- # @return [true] The document was updated.
279
- # @return [false] The document was not updated (still pending).
280
- def refresh!
281
- response = Defender.get("/#{Defender.api_key}/documents/#{signature}.json")['defensio-result']
282
- if response['status'] == 'success'
283
- document.set_attributes(response)
284
- return true
285
- elsif response['status'] == 'pending'
286
- pending!
287
- return false
288
- else
289
- raise StandardError, response['message']
290
- end
53
+ # Initializes a new document
54
+ def initialize
55
+ @data = {}
56
+ @saved = false
291
57
  end
292
58
 
293
59
  ##
294
- # Creates an attributes hash to be sent to Defensio. This method will make
295
- # sure that the required attributess are in, and the names of the attributes
296
- # are correct.
297
- #
298
- # @return [Hash{String => String}]
299
- def attributes_hash
300
- options = {
301
- 'client' => "Defender | #{Defender::VERSION} | Henrik Hodne | henrik.hodne@binaryhex.com",
302
- 'platform' => platform || "ruby",
303
- 'content' => content,
304
- 'type' => type
305
- }
306
- [
307
- :author_email, :author_ip, :author_logged_in, :author_name, :author_openid,
308
- :author_trusted, :author_url, :browser_cookies, :browser_javascript,
309
- :document_permalink, :referrer, :title, :parent_document_permalink
310
- ].each do |symbol|
311
- options[symbol.to_s.gsub("_", "-")] = self.send(symbol)
312
- end
313
-
314
- headers = http_headers
315
- unless headers.nil?
316
- options['http-headers'] = headers.to_a.map do |kv|
317
- kv.respond_to?(:join) ? kv.join(": ") : kv
318
- end.join("\n")
319
- end
320
-
321
- pddate = parent_document_date
322
- options['parent-document-date'] = pddate.respond_to?(:strftime) ?
323
- pddate.strftime("%Y-%m-%d") : pddate
324
-
325
- formatted_options = {}
326
-
327
- options.each do |key, value|
328
- formatted_options[key] = value.to_s unless value.nil?
329
- end
330
-
331
- formatted_options
60
+ # @return [Boolean] Has the document been submitted to Defensio?
61
+ def saved?
62
+ @saved
332
63
  end
333
64
 
334
65
  ##
335
- # Post the document to Defensio to be analyzed for spam and malicious
336
- # content.
337
- #
338
- # @param [Boolean] async Whether or not the document analysis should be done
339
- # asynchronously. With asynchronous document analysis you will obtain
340
- # better accuracy. Do not poll the servers more than once every 30 seconds
341
- # for each document. To avoid polling, set the callback URL with
342
- # {Defender.async_callback}. You can get the information from the server
343
- # using the {#refresh!} method or calling {Document.find} with the
344
- # signature.
66
+ # Submit the document to Defensio.
345
67
  #
346
- # @see #pending?
68
+ # This will send all of the {#data} if the document hasn't been saved
69
+ # before. If it has been saved, it will submit whether the document was a
70
+ # false positive/negative (set the {#allow} param before saving to do
71
+ # this).
347
72
  #
348
- # @raise ArgumentError if a required field is not set.
349
- # @return [Boolean] Whether the record was saved or not.
350
- def save(async=false)
351
- if sig = signature # The document is submitted to Defensio
352
- response = Defender.put("/#{Defender.api_key}/documents/#{sig}.json",
353
- :allow => allow?)['defensio-result']
73
+ # @see #saved?
74
+ def save
75
+ if saved?
76
+ _code, data = Defender.defensio.put_document(@signature, {:allow => @allow})
354
77
  else
355
- hsh = attributes_hash
356
- if attributes_hash['content'].nil?
357
- raise ArgumentError, 'The content field is required'
358
- end
359
- if attributes_hash['type'].nil?
360
- raise ArgumentError, 'The type field is required'
361
- end
362
-
363
- if async
364
- hsh['async'] = 'true'
365
- hsh['async-callback'] = Defender.async_callback if Defender.async_callback
366
- end
367
- response = Defender.post("/#{Defender.api_key}/documents.json", hsh)['defensio-result']
368
- end
369
- if response['status'] == 'success'
370
- set_attributes(response)
371
- return true
372
- elsif response['status'] == 'pending'
373
- set_attributes(response) # Some fields are blank
374
- @pending = true
375
- return true
376
- else
377
- return false
378
- end
379
- end
380
-
381
- def set_attributes(attributes)
382
- [:classification, :signature, :spaminess, :allow].each do |symbol|
383
- self.instance_variable_set(:"@#{symbol}", attributes[symbol.to_s])
384
- end
385
- @profane = attributes['profanity-match']
386
- undefine_setters
387
- end
388
-
389
- ##
390
- # Filters the provided fields. The filtering is based on a default
391
- # dictionary and one previously configured by the user.
392
- #
393
- # @param [Array<Symbol>] *args The fields to filter (like `:content`,
394
- # `:author_name`, etc.)
395
- def filter!(*args)
396
- filter = {}
397
- args.each {|arg| filter[arg] = __send__(arg) }
398
- response = Defender.post("/#{Defender.api_key}/profanity-filter.json", filter)['defensio-result']
399
- if response['status'] == 'success'
400
- response['filtered'].each do |key, value|
401
- self.instance_variable_set(:"@#{key}", value)
402
- end
403
- else
404
- raise StandardError, response['message']
405
- end
406
- end
407
-
408
- private
409
-
410
- def undefine_setters
411
- [
412
- :content=, :platform=, :type=, :author_email=, :author_ip=,
413
- :author_logged_in=, :author_name=, :author_openid=,
414
- :author_trusted=, :author_url=, :browser_cookies=,
415
- :browser_javascript=, :document_permalink=, :http_headers=,
416
- :parent_document_date=, :referrer=, :title=
417
- ].each do |method|
418
- # TODO: Fix hack.
419
- instance_eval "def self.#{method}(*args)\nmethod_missing(#{method.inspect}, *args)\nend"
78
+ data = {}
79
+ @data.each { |k,v|
80
+ data[k.to_s.gsub('_','-')] = v.to_s
81
+ }
82
+ _code, data = Defender.defensio.post_document(@data)
83
+ @allow = data['allow']
84
+ @signature = data['signature']
85
+ @saved = true
420
86
  end
421
87
  end
422
88
  end