defender 0.2.0 → 1.0.0beta1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,132 @@
1
+ Defender
2
+ ========
3
+
4
+ Defender is a wrapper for the [Defensio][0] spam filtering API. From
5
+ their own site:
6
+
7
+ > More than just another spam filter, Defensio also eliminates malware
8
+ > and other unwanted or risky content to fully protect your blog or Web
9
+ > 2.0 application.
10
+
11
+ Defensio is able to not only find spam, but also filter profanity and
12
+ other similarities. It can also see the difference between malicious
13
+ material and spammy material.
14
+
15
+
16
+ Overview
17
+ --------
18
+
19
+ With Defender you can submit documents to Defensio, which will look for
20
+ spam and malicious content in the documents.
21
+
22
+ A document contains content to be analyzed by Defensio, or that has been
23
+ analyzed.
24
+
25
+ Submitting documents to Defensio is really easy. Here's a barebones
26
+ example:
27
+
28
+ require 'defender'
29
+ document = Defender::Document.new
30
+ document.data[:content] = 'Hello World!'
31
+ document.data[:type] = 'comment'
32
+ document.data[:platform] = 'defender'
33
+ document.save
34
+
35
+ The `document.data` hash can contain a lot of data. The ones you see
36
+ here are the only required ones, but you should submit as much data as
37
+ you can. Look at the [Defensio API docs][3] for information on the
38
+ different data you can submit. Oh, and the keys can be symbols, and you
39
+ can use underscores instead of dashes.
40
+
41
+ After saving the document, Defender will set the `document.allow?`,
42
+ `document.spaminess` and
43
+ `document.signature` attributes. The first one tells you if you should
44
+ display the document or not on your website. The second is a float which
45
+ tells you just how spammy the document is. This could be useful for
46
+ sorting the documents in an admin panel. The lower the spaminess is, the
47
+ less chance is it for it being spam. The last attribute is an unique
48
+ identifier you should save with your document in the database. This can
49
+ be used for retrieving the status of your document again, and for
50
+ retraining purposes.
51
+
52
+ Did I say retraining? Oh yes, you can retrain Defensio! If some spam
53
+ went through the filters, or some legit documents were marked as spam,
54
+ tell Defensio by setting the `document.allow` attribute and save the
55
+ document again:
56
+
57
+ document.allow = true
58
+ document.save
59
+
60
+ This tells Defensio that the document should've been allowed. Don't have
61
+ access to the `document` instance any more you say? No problem, just
62
+ retrieve it again using the signature. You did save the signature,
63
+ didn't you?
64
+
65
+ document = Defender::Document.find(signature)
66
+
67
+
68
+ Development
69
+ -----------
70
+
71
+ Want to help out on Defender?
72
+
73
+ First, you should clone the repo and run the features and specs:
74
+
75
+ git clone git://github.com/dvyjones/defender.git
76
+ cd defender
77
+ rake features
78
+ rake spec
79
+
80
+ Feel free to ping the mailing list if you have any problems and we'll
81
+ try to sort it out.
82
+
83
+
84
+ Contributing
85
+ ------------
86
+
87
+ Once you've made your great commits:
88
+
89
+ 1. [Fork][1] defender
90
+ 2. Create a topic branch - `git checkout -b my_branch`
91
+ 3. Push to your branch - `git push origin my_branch`
92
+ 4. Create an [Issue][2] with a link to your branch
93
+ 5. That's it!
94
+
95
+ You might want to checkout our [Contributing][cb] wiki page for
96
+ information on coding standards, new features, etc.
97
+
98
+
99
+ Mailing List
100
+ ------------
101
+
102
+ To join the list simply send an email to <defender@librelist.com>. This
103
+ will subscribe you and send you information about your subscription,
104
+ include unsubscribe information.
105
+
106
+ The archive can be found at <http://librelist.com/browser/>.
107
+
108
+
109
+ Meta
110
+ ----
111
+
112
+ * Code: `git clone git://github.com/dvyjones/defender.git`
113
+ * Home: <http://github.com/dvyjones/defender/>
114
+ * Docs: <http://yardoc.org/docs/dvyjones-defender/>
115
+ * Bugs: <http://github.com/dvyjones/defender/issues>
116
+ * List: <defender@librelist.com>
117
+ * Gems: <http://gemcutter.org/gems/defender>
118
+
119
+ This project uses [Semantic Versioning][sv].
120
+
121
+
122
+ Author
123
+ ------
124
+
125
+ Henrik Hodne :: dvyjones@binaryhex.com :: @dvyjones
126
+
127
+ [0]: http://defensio.com
128
+ [1]: http://help.github.com/forking/
129
+ [2]: http://github.com/dvyjones/defender/issues
130
+ [3]: http://defensio.com/api
131
+ [sv]: http://semver.org
132
+ [cb]: http://wiki.github.com/dvyjones/defender/contributing
data/lib/defender.rb CHANGED
@@ -1,67 +1,24 @@
1
- require 'httparty'
2
-
1
+ require 'defender/version'
3
2
  require 'defender/document'
4
- require 'defender/statistics'
5
3
 
6
4
  module Defender
7
- VERSION = "0.2.0"
8
-
9
- include HTTParty
10
-
11
- # The Defensio API version currently supported by Defender
12
- API_VERSION = "2.0"
13
-
14
- # HTTParty config
15
- format :json
16
- base_uri "api.defensio.com/#{API_VERSION}/users"
17
-
18
- class << self
19
- ##
20
- # Your Defensio API key. You need to register at defensio.com to get a key.
21
- attr_accessor :api_key
22
-
23
- ##
24
- # The URL that will be called when Defensio is done analyzing a comment with
25
- # asynchronous callbacks. You should be able to pass the request parameters
26
- # straight into {Document#set_attributes}. The signature will be in the
27
- # `signature` parameter.
28
- #
29
- # *IMPORTANT*: Defensio will NOT retry unsuccessful callbacks to your
30
- # server. If you do not see a POST originating from Defensio after 5
31
- # minutes, call {Document#refresh!} on the document to obtain the analysis
32
- # result.
33
- #
34
- # Occasionally, Defensio may perform more than one POST request to your
35
- # server for the same document. For example, if new evidence indicates that
36
- # a document is unwanted, even though it was originally identified as
37
- # legitimate, Defensio might notify you that the classification has changed.
38
- #
39
- # If you do not provide this and use asynchronous calling, you need to call
40
- # {Document#refresh!} to get the analysis result.
41
- #
42
- # You can debug callbacks using http://postbin.org. See the Defensio API
43
- # documents for the format of the requests.
44
- #
45
- # @return [String]
46
- attr_accessor :async_callback
5
+ ##
6
+ # You most probably don't need to set this. It is used to replace the backend
7
+ # when running the tests. If you for any reason need to use another backend
8
+ # than the defensio gem, set this. The object needs to respond to the same
9
+ # methods as the {Defensio} object does.
10
+ #
11
+ # @param [Defensio] defensio The Defensio backend
12
+ def self.defensio=(defensio)
13
+ @defensio = defensio
47
14
  end
48
15
 
49
16
  ##
50
- # Determines if the given API key is valid or not. This should only be used
51
- # when configuring the client and prior to every content analysis (Document
52
- # POST).
53
- #
54
- # Set the API key using {Defender.api_key}.
55
- #
56
- # @return [Boolean] Whether the API key was valid or not.
57
- def self.check_api_key
58
- key = Defender.api_key
59
- return false unless key
60
- resp = get("/#{key}.json")['defensio-result']
61
- if resp['status'] == 'success'
62
- return true
63
- else
64
- return false
65
- end
17
+ # The Defensio backend. If no backend has been set yet, this will create one
18
+ # with the api key set with {Defender.api_key}.
19
+ def self.defensio
20
+ return @defensio if defined?(@defensio)
21
+ require 'defensio'
22
+ @defensio ||= Defensio.new(Defender.api_key, "Defender | #{VERSION} | Henrik Hodne | dvyjones@binaryhex.com")
66
23
  end
67
24
  end
@@ -1,422 +1,88 @@
1
1
  module Defender
2
- ##
3
- # A document contains content to be analyzed by Defensio, or that has been
4
- # analyzed.
5
- #
6
- # Most of the Defensio API revolves around documents, including the detection
7
- # of unwanted content.
8
2
  class Document
9
3
  ##
10
- # Whether the document should be published by your Web site or not. For
11
- # example, spam and malicious content are not allowed.
4
+ # Whether the document should be published on your Web site or not.
12
5
  #
13
- # This is the only attribute that can be updated after the initial saving.
14
- # Use this for retraining purposes.
6
+ # For example, spam and malicious content are not allowed.
15
7
  #
16
8
  # @return [Boolean]
17
9
  attr_accessor :allow
18
- alias :allow? :allow
10
+ alias_method :allow?, :allow
19
11
 
20
12
  ##
21
- # The type of content in the document.
13
+ # The information about the document. This hash accepts so many parameters
14
+ # I won't list them here. Go look at the [Defensio API docs]
15
+ # (http://defensio.com/api) instead.
22
16
  #
23
- # @return [String] The possible values are innocent, spam and malicious.
24
- attr_reader :classification
25
-
26
- ##
27
- # Whether the document matches profanity or other words defined by the
28
- # user. For example, this is useful to detect obscene comments posted
29
- # to your Web site. When true, you can obtain a filtered version of the
30
- # document by calling {#filter!}.
31
- #
32
- # @return [Boolean]
33
- attr_reader :profane
34
- alias :profane? :profane
35
-
36
- ##
37
- # A unique identifier for the document. You need this value to perform new
38
- # requests on the same document. Signatures should be kept private and never
39
- # be shared with your users.
40
- #
41
- # @return [String]
42
- attr_reader :signature
43
-
44
- ##
45
- # A numeric value indicating how strongly the document resembles spam. For
46
- # example, a document containing many links to pharmaceutical sites is
47
- # likely to have a very high spaminess value. This value should only be used
48
- # for sorting, and should never be used to determine if a document should be
49
- # allowed or not. Spaminess should be kept private and never be shared with
50
- # your users.
51
- #
52
- # @return [Float<0..1>] A float value between 0 and 1, whith 1 being
53
- # extremely spammy. For example, 0.89 (89%).
54
- attr_reader :spaminess
55
-
56
- ##
57
- # The string containing the body of the document. This field is required.
58
- #
59
- # @return [String]
60
- attr_accessor :content
61
-
62
- ##
63
- # The platform which the document is submitted on.
64
- #
65
- # One word, lower case. Spaces should be converted to underscores.
66
- #
67
- # *Examples:*
68
- # wordpress, pixelpost, drupal, phpbb, movable_type
69
- #
70
- # The default is 'ruby'.
71
- #
72
- # @return [String]
73
- attr_accessor :platform
74
-
75
- ##
76
- # Identified the type of content to be analyzed.
77
- #
78
- # Use *test* only for testing purposes.
79
- #
80
- # When *type* is set to *test*, Defensio (not Defender) parses content for
81
- # classification and spaminess. For example, if you want the API to return
82
- # *malicious* as the classification and a spaminess of *0.99*, insert the
83
- # following in content:
84
- # [malicious,0.99]
85
- #
86
- # There are three possible classifications:
87
- #
88
- # * innocent
89
- # * spam
90
- # * malicious
91
- #
92
- # Spaminess should be a decimal value between 0 and 1 (see
93
- # {#spaminess})
94
- #
95
- # *IMPORTANT*
96
- #
97
- # Do *NOT* leave type set to *test* in production. This could represent a
98
- # significant security breach.
99
- attr_accessor :type
100
-
101
- ##
102
- # The email address of the author of the document.
17
+ # Defender will replace all underscores in keys with dashes, so you can use
18
+ # `:author_email` instead of `'author-email'`.
103
19
  #
104
- # @return [String]
105
- attr_accessor :author_email
106
-
107
- ##
108
- # The IP address of the author of the document.
109
- #
110
- # For example, this could be the IP address of the person posting a comment
111
- # on a blog.
112
- #
113
- # @return [String]
114
- attr_accessor :author_ip
115
-
116
- ##
117
- # Whether or not the user posting the document is logged in onto your Web
118
- # site, either through your own authentication mechanism or through OpenID.
119
- #
120
- # @see Document#author_openid
121
- # @see Document#author_trusted
122
- # @return [Boolean]
123
- attr_accessor :author_logged_in
124
-
125
- ##
126
- # The name of the author of the document.
127
- #
128
- # @return [Boolean]
129
- attr_accessor :author_name
130
-
131
- ##
132
- # The OpenID URL of the logged-on user. Must be used in conjunction with
133
- # {Document#author_logged_in} = true.
134
- #
135
- # OpenID authentication must be taken care of by your application. Only send
136
- # this parameter if you have successfully authenticated the user with
137
- # OpenID.
138
- #
139
- # @return [String]
140
- attr_accessor :author_openid
20
+ # @return [Hash{#to_s => #to_s}]
21
+ attr_accessor :data
141
22
 
142
23
  ##
143
- # Whether or not the user is an administrator, moderator or editor of your
144
- # Web site. Pass `true` only if you can guarantee that the user has been
145
- # authenticated, has a role of responsibility, and can be trusted as a good
146
- # Web citizen.
24
+ # A unique identifier for the document.
147
25
  #
148
- # @return [Boolean]
149
- attr_accessor :author_trusted
150
-
151
- ##
152
- # The URL of the person posting the document.
26
+ # This is needed to retrieve the status back from Defensio and to submit
27
+ # false negatives/positives to Defensio. Signatures should be kept private
28
+ # and never shared with your users.
153
29
  #
154
30
  # @return [String]
155
- attr_accessor :author_url
156
-
157
- ##
158
- # Whether or not the Web browser used to post the document (i.e., the
159
- # comment) has cookies enabled. If no such detection has been made, leave
160
- # this value empty.
161
- #
162
- # @return [Boolean]
163
- attr_accessor :browser_cookies
164
-
165
- ##
166
- # Whether or not the Web browser used to post the document (i.e., the
167
- # comment) has JavaScript enabled. If no such detection has been made, leave
168
- # this value empty.
169
- #
170
- # @return [Boolean]
171
- attr_accessor :browser_javascript
172
-
173
- ##
174
- # The URL of the document being posted.
175
- #
176
- # *Examples*
177
- #
178
- # For a comment on a blog, the permalink URL might be:
179
- #
180
- # 'http://yourdomain.com/article#comment-51'
181
- #
182
- # For an article, it might be:
183
- #
184
- # 'http://yourdomain.com/article'
185
- #
186
- # @return [String]
187
- attr_accessor :document_permalink
188
-
189
- ##
190
- # Contains the HTTP headers sent with the request. You can send a few values
191
- # or all values. Because this information helps Defensio determine if a
192
- # document is innocent or not, the more headers you send, the better.
193
- #
194
- # @see #referrer
195
- # @return [Hash{String => String}, Array<String>] You can pass a hash with
196
- # key => values, or an array where each entry has the format `"HEADER:
197
- # value"`
198
- attr_accessor :http_headers
199
-
200
- ##
201
- # The date the parent document was posted. For example, on a blog, this
202
- # would be the date the article related to the comment (document) was
203
- # posted.
204
- #
205
- # If you are using threaded comments, send the date the article was posted,
206
- # *not* the date the parent comment was posted.
207
- #
208
- # @return [Time, Date, DateTime, "yyyy-mm-dd"] If a Time or DateTime is passed, only the
209
- # date part will be saved.
210
- attr_accessor :parent_document_date
211
-
212
- ##
213
- # The URL of the parent document. For example, on a blog, this would be the
214
- # URL of the article on which the comment (document) was posted.
215
- #
216
- # @see #document_permalink
217
- # @return [String]
218
- attr_accessor :parent_document_permalink
219
-
220
- ##
221
- # Provide the value of the HTTP_REFERER (note the spelling) in this field.
222
- #
223
- # @see #http_headers
224
- # @return [String]
225
- attr_accessor :referrer
226
-
227
- ##
228
- # Provide the title of the document being sent. For example, this might be
229
- # the title of a blog article.
230
- #
231
- # Do not send this information if no title has been provided.
232
- attr_accessor :title
233
-
234
- ##
235
- # Is the document still pending?
236
- #
237
- # @return [Boolean]
238
- attr_reader :pending
239
- alias :pending? :pending
240
-
241
- ##
242
- # Set the pending attribute to true. Only to be used by {find} and similar
243
- # methods.
244
- #
245
- # @private
246
- def pending!; @pending = true; end
31
+ attr_reader :signature
247
32
 
248
33
  ##
249
- # Retrieves a document from the Defensio server.
34
+ # Retrieves the status of a document back from Defensio.
250
35
  #
251
- # This can be called up to 30 days after the initial posting of a document
252
- # to Defensio.
36
+ # Please note that this only retrieves the status of the document (like
37
+ # it's spaminess, whether it should be allowed or not, etc.) and not the
38
+ # content of the request (all of the data in the {#data} hash).
253
39
  #
254
- # @return [Document]
40
+ # @param [String] signature The signature of the document to retrieve
41
+ # @return [Document] The document to retrieve
255
42
  def self.find(signature)
256
- document = new()
257
- response = Defender.get("/#{Defender.api_key}/documents/#{signature}.json")['defensio-result']
258
- if response['status'] == 'success' || response['status'] == 'pending'
259
- document.set_attributes(response)
260
- document.pending! if response['status'] == 'pending'
261
- else
262
- raise StandardError, response['message']
263
- end
264
- document
265
- end
43
+ document = new
44
+ _code, data = Defender.defensio.get_document(signature)
45
+ document.instance_variable_set(:@saved, true)
46
+ document.instance_variable_set(:@allow, data['allow'])
47
+ document.instance_variable_set(:@signature, signature)
266
48
 
267
- ##
268
- # Create a new document.
269
- def initialize()
49
+ document
270
50
  end
271
51
 
272
52
  ##
273
- # Re-retrieves the document from the Defensio server
274
- #
275
- # This can be called up to 30 days after the initial posting of the document
276
- # to Defensio
277
- #
278
- # @return [true] The document was updated.
279
- # @return [false] The document was not updated (still pending).
280
- def refresh!
281
- response = Defender.get("/#{Defender.api_key}/documents/#{signature}.json")['defensio-result']
282
- if response['status'] == 'success'
283
- document.set_attributes(response)
284
- return true
285
- elsif response['status'] == 'pending'
286
- pending!
287
- return false
288
- else
289
- raise StandardError, response['message']
290
- end
53
+ # Initializes a new document
54
+ def initialize
55
+ @data = {}
56
+ @saved = false
291
57
  end
292
58
 
293
59
  ##
294
- # Creates an attributes hash to be sent to Defensio. This method will make
295
- # sure that the required attributess are in, and the names of the attributes
296
- # are correct.
297
- #
298
- # @return [Hash{String => String}]
299
- def attributes_hash
300
- options = {
301
- 'client' => "Defender | #{Defender::VERSION} | Henrik Hodne | henrik.hodne@binaryhex.com",
302
- 'platform' => platform || "ruby",
303
- 'content' => content,
304
- 'type' => type
305
- }
306
- [
307
- :author_email, :author_ip, :author_logged_in, :author_name, :author_openid,
308
- :author_trusted, :author_url, :browser_cookies, :browser_javascript,
309
- :document_permalink, :referrer, :title, :parent_document_permalink
310
- ].each do |symbol|
311
- options[symbol.to_s.gsub("_", "-")] = self.send(symbol)
312
- end
313
-
314
- headers = http_headers
315
- unless headers.nil?
316
- options['http-headers'] = headers.to_a.map do |kv|
317
- kv.respond_to?(:join) ? kv.join(": ") : kv
318
- end.join("\n")
319
- end
320
-
321
- pddate = parent_document_date
322
- options['parent-document-date'] = pddate.respond_to?(:strftime) ?
323
- pddate.strftime("%Y-%m-%d") : pddate
324
-
325
- formatted_options = {}
326
-
327
- options.each do |key, value|
328
- formatted_options[key] = value.to_s unless value.nil?
329
- end
330
-
331
- formatted_options
60
+ # @return [Boolean] Has the document been submitted to Defensio?
61
+ def saved?
62
+ @saved
332
63
  end
333
64
 
334
65
  ##
335
- # Post the document to Defensio to be analyzed for spam and malicious
336
- # content.
337
- #
338
- # @param [Boolean] async Whether or not the document analysis should be done
339
- # asynchronously. With asynchronous document analysis you will obtain
340
- # better accuracy. Do not poll the servers more than once every 30 seconds
341
- # for each document. To avoid polling, set the callback URL with
342
- # {Defender.async_callback}. You can get the information from the server
343
- # using the {#refresh!} method or calling {Document.find} with the
344
- # signature.
66
+ # Submit the document to Defensio.
345
67
  #
346
- # @see #pending?
68
+ # This will send all of the {#data} if the document hasn't been saved
69
+ # before. If it has been saved, it will submit whether the document was a
70
+ # false positive/negative (set the {#allow} param before saving to do
71
+ # this).
347
72
  #
348
- # @raise ArgumentError if a required field is not set.
349
- # @return [Boolean] Whether the record was saved or not.
350
- def save(async=false)
351
- if sig = signature # The document is submitted to Defensio
352
- response = Defender.put("/#{Defender.api_key}/documents/#{sig}.json",
353
- :allow => allow?)['defensio-result']
73
+ # @see #saved?
74
+ def save
75
+ if saved?
76
+ _code, data = Defender.defensio.put_document(@signature, {:allow => @allow})
354
77
  else
355
- hsh = attributes_hash
356
- if attributes_hash['content'].nil?
357
- raise ArgumentError, 'The content field is required'
358
- end
359
- if attributes_hash['type'].nil?
360
- raise ArgumentError, 'The type field is required'
361
- end
362
-
363
- if async
364
- hsh['async'] = 'true'
365
- hsh['async-callback'] = Defender.async_callback if Defender.async_callback
366
- end
367
- response = Defender.post("/#{Defender.api_key}/documents.json", hsh)['defensio-result']
368
- end
369
- if response['status'] == 'success'
370
- set_attributes(response)
371
- return true
372
- elsif response['status'] == 'pending'
373
- set_attributes(response) # Some fields are blank
374
- @pending = true
375
- return true
376
- else
377
- return false
378
- end
379
- end
380
-
381
- def set_attributes(attributes)
382
- [:classification, :signature, :spaminess, :allow].each do |symbol|
383
- self.instance_variable_set(:"@#{symbol}", attributes[symbol.to_s])
384
- end
385
- @profane = attributes['profanity-match']
386
- undefine_setters
387
- end
388
-
389
- ##
390
- # Filters the provided fields. The filtering is based on a default
391
- # dictionary and one previously configured by the user.
392
- #
393
- # @param [Array<Symbol>] *args The fields to filter (like `:content`,
394
- # `:author_name`, etc.)
395
- def filter!(*args)
396
- filter = {}
397
- args.each {|arg| filter[arg] = __send__(arg) }
398
- response = Defender.post("/#{Defender.api_key}/profanity-filter.json", filter)['defensio-result']
399
- if response['status'] == 'success'
400
- response['filtered'].each do |key, value|
401
- self.instance_variable_set(:"@#{key}", value)
402
- end
403
- else
404
- raise StandardError, response['message']
405
- end
406
- end
407
-
408
- private
409
-
410
- def undefine_setters
411
- [
412
- :content=, :platform=, :type=, :author_email=, :author_ip=,
413
- :author_logged_in=, :author_name=, :author_openid=,
414
- :author_trusted=, :author_url=, :browser_cookies=,
415
- :browser_javascript=, :document_permalink=, :http_headers=,
416
- :parent_document_date=, :referrer=, :title=
417
- ].each do |method|
418
- # TODO: Fix hack.
419
- instance_eval "def self.#{method}(*args)\nmethod_missing(#{method.inspect}, *args)\nend"
78
+ data = {}
79
+ @data.each { |k,v|
80
+ data[k.to_s.gsub('_','-')] = v.to_s
81
+ }
82
+ _code, data = Defender.defensio.post_document(@data)
83
+ @allow = data['allow']
84
+ @signature = data['signature']
85
+ @saved = true
420
86
  end
421
87
  end
422
88
  end