language_filter 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 7e1c9631fe0b3dbfdf7b9fabedfb944e3e24f415
4
- data.tar.gz: 9e2de907ce18dca2e42ce502e02a9e049c152abb
3
+ metadata.gz: 26f77df57fc50ffb3f1898c5f1a586d54b7af10d
4
+ data.tar.gz: f966bdf06765fa035c2a0556e8222d0299197a0a
5
5
  SHA512:
6
- metadata.gz: c2f9f774e2bd8225f5777f2a9b69a8dada15b20db7dc488f878ece855cafa1a7d0dcdc78af3532fbbd929483d25887fa688821ef016a5827ef865d48ed2e60f2
7
- data.tar.gz: af3809386cf240297283ab9ad9da7a2fd7e94de8d4e98e05e08059b396eb1235beb484a036cd82e9354da2cf1ed00b0204fe655fe77a0fef8706996b8e8f35e3
6
+ metadata.gz: 2514597d8f670ba7eec79a5768871cd7801e4a41623b36418510e4e5e76a3702dca5ec72b9d3018e4ca8bb70fbfc580e59754c248bcd9efcd2c1a9a2a38f5483
7
+ data.tar.gz: 1009f14d9849a113472595848b576c5350629eb178035dccf7ab7c5ce8ae6dd819bdaeafa2bb9c35870a3add1a38b9ff3a8572da003419caaa621227ab63dd74
data/README.md CHANGED
@@ -1,5 +1,25 @@
1
+ - [LanguageFilter](#languagefilter)
2
+ - [About](#about)
3
+ - [Guiding Principles](#guiding-principles)
4
+ - [TO-DO](#to-do)
5
+ - [Installation](#installation)
6
+ - [Usage](#usage)
7
+ - [`:matchlist` and `:exceptionlist`](#matchlist-and-exceptionlist)
8
+ - [Symbol signifying a pre-packaged list](#symbol-signifying-a-pre-packaged-list)
9
+ - [An array of words and phrases to screen for](#an-array-of-words-and-phrases-to-screen-for)
10
+ - [A filepath or string pointing to a filepath](#a-filepath-or-string-pointing-to-a-filepath)
11
+ - [Formatting your lists](#formatting-your-lists)
12
+ - [`:replacement`](#replacement)
13
+ - [`:creative_letters`](#creative_letters)
14
+ - [Methods to modify filters after creation](#methods-to-modify-filters-after-creation)
15
+ - [ActiveModel integration](#activemodel-integration)
16
+ - [Contributing](#contributing)
17
+
18
+
1
19
  # LanguageFilter
2
20
 
21
+ ## About
22
+
3
23
  LanguageFilter is a Ruby gem to detect and optionally filter multiple categories of language. It was adapted from Thiago Jackiw's Obscenity gem for [FractalWriting.org](http://fractalwriting.org) and features many improvements, including:
4
24
 
5
25
  - The ability to create and independently configure multiple language filters.
@@ -8,6 +28,30 @@ LanguageFilter is a Ruby gem to detect and optionally filter multiple categories
8
28
  - More neutral language to accommodate a wider variety of use cases. For example, LanguageFilter uses `matchlist` and `exceptionlist` instead of `blacklist` and `whitelist`, since the gem can be used not only for censorship, but also for content *type* identification (e.g. fantasy, sci-fi, historical, etc in the context of creative writing)
9
29
  - More robust exceptionlist (i.e. whitelist) handling. Given a simple example of a matchlist containing `cock` and an exceptionlist containing `game cock`, the other filtering gems I've seen will flag the `cock` in `game cock`, despite the exceptionlist. LanguageFilter is a little smarter and does what you would expect, so that when sanitizing the string `cock is usually sexual, but a game cock is just an animal`, the returned string will be `**** is usually sexual, but a game cock is just an animal`.
10
30
 
31
+ It should be noted however, that if you'd like to use this gem or another language filtering library to replace human moderation, you should not, for [reasons outlined here](http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html). The major takeaway is that content filtering is a very difficult problem and context is everything. You can keep refining your filters, but that can easily become a full-time job and it can be difficult to do these refinements without unintentionally creating more false positives, which is extremely frustrating from a user's point of view. This kind of tool is best used to *guide* users, rather than enforce rules on them. See the guiding principles below for more on this.
32
+
33
+ ## Guiding Principles
34
+
35
+ These are things I've learned from developing this gem that are good to keep in mind when using or contributing to the project.
36
+
37
+ **It's better to under-match than over-match.**
38
+
39
+ It's extremely frustrating, for example, if someone is prevented from entering a perfectly good username that just happens to contain the word "ass" in it - as many do. It's not nearly as frustrating to be exposed to profanity that you have to strain to make out.
40
+
41
+ **Using filters for language detection that aid in self-categorization is a better idea than automatically forcing mature/profane/sexual/etc tags on user-generated content.**
42
+
43
+ If someone uses language that could be considered profanity in many contexts, but is not profanity in their particular context, such as "bitch" to describe a female dog or "ass" to describe a donkey, they will be justifiably upset at the automatic categorization. It's better to say, "Your story contains the following words or phrases that we think might be profane: bitch, ass. Click on the `profane` tag if you'd like to add it." Then other users can flag content that still isn't correctly categorized and moderators can edit content tags and educate the user to further prevent miscategorization.
44
+
45
+ ## TO-DO
46
+
47
+ - Expand the pre-packaged matchlists to be more exhaustive
48
+ - Add some activemodel integration, a la something like:
49
+
50
+ ``` ruby
51
+ filter_language :content, matchlist: :hate, replacement: :garbled
52
+ validate_language :username, matchlist: :profanity
53
+ ```
54
+
11
55
  ## Installation
12
56
 
13
57
  Add this line to your application's Gemfile:
@@ -142,6 +186,34 @@ Example: This is some f*ck*d up sh*t.
142
186
 
143
187
  Example: 7|-|1$ 1$ $0/\/\3 Ph*****D UP ******.
144
188
 
189
+ (**note: `creative_letters: true` must be set to match plain words to leetspeak**)
190
+
191
+ ### `:creative_letters`
192
+
193
+ If you want to match leetspeak or other creative lettering, figuring out all the possible variations of each letter in a word can be exhausting. *And* you don't want to go through the whole process for each and every word, creating complicated matchlists that humans will struggle to parse.
194
+
195
+ That's why there's a :creative_letters option. When set to true, your filter will use a version of your matchlist that will catch common and not-so-common letterings for each word in your matchlist. The downside to this option is a significant hit to performance.
196
+
197
+ Here's an example. Let's say you have a matchlist with a single word:
198
+
199
+ ```
200
+ hippopotamus
201
+ ```
202
+
203
+ But what if some smart-allec types in something like this?
204
+
205
+ ```
206
+ }{!|o|o[]|o()+4|\/|v$
207
+ ```
208
+
209
+ Well, if you have :creative_letters activated, the matchlist that your filtering engine will actually use looks more like this:
210
+
211
+ ```
212
+ (?:(?:h|\\#|[\\|\\}\\{\\\\/\\(\\)\\[\\]]\\-?[\\|\\}\\{\\\\/\\(\\)\\[\\]])+)(?:(?:i|l|1|\\!|\\u00a1|\\||\\]|\\[|\\\\|/|[^a-z]eye[^a-z]|\\u00a3|[\\|li1\\!\\u00a1\\[\\]\\(\\)\\{\\}]_|\\u00ac|[^a-z]el+[^a-z]))(?:(?:p|\\u00b6|[\\|li1\\[\\]\\!\\u00a1/\\\\][\\*o\\u00b0\\\"\\>7\\^]|[^a-z]pee+[^a-z])+)(?:(?:p|\\u00b6|[\\|li1\\[\\]\\!\\u00a1/\\\\][\\*o\\u00b0\\\"\\>7\\^]|[^a-z]pee+[^a-z])+)(?:(?:o|0|\\(\\)|\\[\\]|\\u00b0|[^a-z]oh+[^a-z])+)(?:(?:p|\\u00b6|[\\|li1\\[\\]\\!\\u00a1/\\\\][\\*o\\u00b0\\\"\\>7\\^]|[^a-z]pee+[^a-z])+)(?:(?:o|0|\\(\\)|\\[\\]|\\u00b0|[^a-z]oh+[^a-z])+)(?:(?:t|7|\\+|\\u2020|\\-\\|\\-|\\'\\]\\[\\')+)(?:(?:a|@|4|\\^|/\\\\|/\\-\\\\|aye?)+)(?:(?:m|[\\|\\(\\)/](?:\\\\/|v|\\|)[\\|\\(\\)\\\\]|\\^\\^|[^a-z]em+[^a-z])+)(?:(?:u|v|\\u00b5|[\\|\\(\\)\\[\\]\\{\\}]_[\\|\\(\\)\\[\\]\\{\\}]|\\L\\||\\/|[^a-z]you[^a-z]|[^a-z]yoo+[^a-z]|[^a-z]vee+[^a-z]))(?:(?:s|\\$|5|\\u00a7|[^a-z]es+[^a-z]|z|2|7_|\\~/_|\\>_|\\%|[^a-z]zee+[^a-z])+)
213
+ ```
214
+
215
+ And that barely legible mess can be made completely illegible by the `sanitize` method. Even *this* crazy string of regex can be beaten though. People *will* have to get quite creative, but people *are* creative. And making it difficult to enter banned content can make it quite an attractive challenge. For this reason and because of the aforementioned performance hit, **this option is not recommended for production systems**.
216
+
145
217
  ### Methods to modify filters after creation
146
218
 
147
219
  If you ever want to change the matchlist, exceptionlist, or replacement type, each parameter is accessible via an assignment method.
@@ -181,6 +253,32 @@ my_filter.matchlist.uniq!
181
253
  # etc...
182
254
  ```
183
255
 
256
+ ### ActiveModel integration
257
+
258
+ There's not yet any built-in ActiveModel integration, but that doesn't mean it isn't a breeze to work with filters in your model. The examples below should help get you started.
259
+
260
+ ```ruby
261
+ # garbles any hateful language in the content attribute before any save to the database
262
+ before_save :remove_hateful_language
263
+
264
+ def remove_hateful_language
265
+ hate_filter = LanguageFilter::Filter.new matchlist: :hate, replacement: :garbled
266
+ content = hate_filter.sanitize(content)
267
+ end
268
+ ````
269
+
270
+ ``` ruby
271
+ # yells at users if they try to sneak in a dirty username, letting them know exactly why the username they wanted was rejected
272
+ validate :clean_username
273
+
274
+ def clean_username
275
+ profanity_filter = LanguageFilter::Filter.new matchlist: :profanity
276
+ if profanity_filter.match? username then
277
+ errors.add(:username, "The following language is inappropriate in a username: #{profanity_filter.matched(username).join(', ')}"
278
+ end
279
+ end
280
+ ```
281
+
184
282
  ## Contributing
185
283
 
186
284
  1. Fork it
data/Rakefile CHANGED
@@ -1 +1,12 @@
1
+ #!/usr/bin/env rake
1
2
  require "bundler/gem_tasks"
3
+
4
+ require 'rake/testtask'
5
+
6
+ Rake::TestTask.new do |t|
7
+ t.libs << 'lib'
8
+ t.test_files = FileList['test/lib/language_filter/*_test.rb']
9
+ t.verbose = true
10
+ end
11
+
12
+ task :default => :test
File without changes
File without changes
@@ -0,0 +1 @@
1
+ confucius
@@ -0,0 +1,5 @@
1
+ sexton
2
+ sextus
3
+ bonner
4
+ tittles?
5
+ puzzles?
@@ -0,0 +1,5 @@
1
+ x+l*i+
2
+ cilicia
3
+ gunther
4
+ gunnar
5
+ gunwale
@@ -0,0 +1,7 @@
1
+ \w*fuck[ae]r?s?
2
+ fag\w*
3
+ cunt\w*
4
+ as*hole\w*
5
+ \w*bitch\w*
6
+ fudge ?pack\w*
7
+ bastards?
@@ -0,0 +1,342 @@
1
+ 2g1c
2
+ 2 girls 1 cup
3
+ acrotomophilia
4
+ anal
5
+ anilingus
6
+ anus
7
+ arsehole
8
+ ass
9
+ asshole
10
+ assmunch
11
+ auto erotic
12
+ autoerotic
13
+ babeland
14
+ baby batter
15
+ ball gag
16
+ ball gravy
17
+ ball kicking
18
+ ball licking
19
+ ball sack
20
+ ball sucking
21
+ bangbros
22
+ bareback
23
+ barely legal
24
+ barenaked
25
+ bastardo
26
+ bastinado
27
+ bbw
28
+ bdsm
29
+ beaver cleaver
30
+ beaver lips
31
+ bestiality
32
+ bi curious
33
+ big black
34
+ big breasts
35
+ big knockers
36
+ big tits
37
+ bimbos
38
+ birdlock
39
+ bitch
40
+ black cock
41
+ blonde action
42
+ blonde on blonde action
43
+ blow j
44
+ blow your l
45
+ blue waffle
46
+ blumpkin
47
+ bollocks
48
+ bondage
49
+ boner
50
+ boob
51
+ boobs
52
+ booty call
53
+ brown showers
54
+ brunette action
55
+ bukkake
56
+ bulldyke
57
+ bullet vibe
58
+ bung hole
59
+ bunghole
60
+ busty
61
+ butt
62
+ buttcheeks
63
+ butthole
64
+ camel toe
65
+ camgirl
66
+ camslut
67
+ camwhore
68
+ carpet muncher
69
+ carpetmuncher
70
+ chocolate rosebuds
71
+ circlejerk
72
+ cleveland steamer
73
+ clit
74
+ clitoris
75
+ clover clamps
76
+ clusterfuck
77
+ cock
78
+ cocks
79
+ coprolagnia
80
+ coprophilia
81
+ cornhole
82
+ cum
83
+ cumming
84
+ cunnilingus
85
+ cunt
86
+ darkie
87
+ date rape
88
+ daterape
89
+ deep throat
90
+ deepthroat
91
+ dick
92
+ dildo
93
+ dirty pillows
94
+ dirty sanchez
95
+ dog style
96
+ doggie style
97
+ doggiestyle
98
+ doggy style
99
+ doggystyle
100
+ dolcett
101
+ domination
102
+ dominatrix
103
+ dommes
104
+ donkey punch
105
+ double dong
106
+ double penetration
107
+ dp action
108
+ eat my ass
109
+ ecchi
110
+ ejaculation
111
+ erotic
112
+ erotism
113
+ escort
114
+ ethical slut
115
+ eunuch
116
+ faggot
117
+ fecal
118
+ felch
119
+ fellatio
120
+ feltch
121
+ female squirting
122
+ femdom
123
+ figging
124
+ fingering
125
+ fisting
126
+ foot fetish
127
+ footjob
128
+ frotting
129
+ fuck
130
+ fuck buttons
131
+ fudge packer
132
+ fudgepacker
133
+ futanari
134
+ g-spot
135
+ gang bang
136
+ gay sex
137
+ genitals
138
+ giant cock
139
+ girl on
140
+ girl on top
141
+ girls gone wild
142
+ goatcx
143
+ goatse
144
+ gokkun
145
+ golden shower
146
+ goo girl
147
+ goodpoop
148
+ goregasm
149
+ grope
150
+ group sex
151
+ guro
152
+ hand job
153
+ handjob
154
+ hard core
155
+ hardcore
156
+ hentai
157
+ homoerotic
158
+ honkey
159
+ hooker
160
+ hot chick
161
+ how to kill
162
+ how to murder
163
+ huge fat
164
+ humping
165
+ incest
166
+ intercourse
167
+ jack off
168
+ jail bait
169
+ jailbait
170
+ jerk off
171
+ jigaboo
172
+ jiggaboo
173
+ jiggerboo
174
+ jizz
175
+ juggs
176
+ kike
177
+ kinbaku
178
+ kinkster
179
+ kinky
180
+ knobbing
181
+ leather restraint
182
+ leather straight jacket
183
+ lemon party
184
+ lolita
185
+ lovemaking
186
+ make me come
187
+ male squirting
188
+ masturbate
189
+ menage a trois
190
+ milf
191
+ missionary position
192
+ motherfucker
193
+ mound of venus
194
+ mr hands
195
+ muff diver
196
+ muffdiving
197
+ nambla
198
+ nawashi
199
+ negro
200
+ neonazi
201
+ nig nog
202
+ nigga
203
+ nigger
204
+ nimphomania
205
+ nipple
206
+ nipples
207
+ nsfw images
208
+ nude
209
+ nudity
210
+ nympho
211
+ nymphomania
212
+ octopussy
213
+ omorashi
214
+ one cup two girls
215
+ one guy one jar
216
+ orgasm
217
+ orgy
218
+ paedophile
219
+ panties
220
+ panty
221
+ pedobear
222
+ pedophile
223
+ pegging
224
+ penis
225
+ phone sex
226
+ piece of shit
227
+ piss pig
228
+ pissing
229
+ pisspig
230
+ playboy
231
+ pleasure chest
232
+ pole smoker
233
+ ponyplay
234
+ poof
235
+ poop chute
236
+ poopchute
237
+ porn
238
+ porno
239
+ pornography
240
+ prince albert piercing
241
+ pthc
242
+ pubes
243
+ pussy
244
+ queaf
245
+ raghead
246
+ raging boner
247
+ rape
248
+ raping
249
+ rapist
250
+ rectum
251
+ reverse cowgirl
252
+ rimjob
253
+ rimming
254
+ rosy palm
255
+ rosy palm and her 5 sisters
256
+ rusty trombone
257
+ s&m
258
+ sadism
259
+ scat
260
+ schlong
261
+ scissoring
262
+ semen
263
+ sex
264
+ sexo
265
+ sexy
266
+ shaved beaver
267
+ shaved pussy
268
+ shemale
269
+ shibari
270
+ shit
271
+ shota
272
+ shrimping
273
+ slanteye
274
+ slut
275
+ smut
276
+ snatch
277
+ snowballing
278
+ sodomize
279
+ sodomy
280
+ spic
281
+ spooge
282
+ spread legs
283
+ strap on
284
+ strapon
285
+ strappado
286
+ strip club
287
+ style doggy
288
+ suck
289
+ sucks
290
+ suicide girls
291
+ sultry women
292
+ swastika
293
+ swinger
294
+ tainted love
295
+ taste my
296
+ tea bagging
297
+ threesome
298
+ throating
299
+ tied up
300
+ tight white
301
+ tit
302
+ tits
303
+ titties
304
+ titty
305
+ tongue in a
306
+ topless
307
+ tosser
308
+ towelhead
309
+ tranny
310
+ tribadism
311
+ tub girl
312
+ tubgirl
313
+ tushy
314
+ twat
315
+ twink
316
+ twinkie
317
+ two girls one cup
318
+ undressing
319
+ upskirt
320
+ urethra play
321
+ urophilia
322
+ vagina
323
+ venus mound
324
+ vibrator
325
+ violet blue
326
+ violet wand
327
+ vorarephilia
328
+ voyeur
329
+ vulva
330
+ wank
331
+ wet dream
332
+ wetback
333
+ white power
334
+ women rapping
335
+ wrapping men
336
+ wrinkled starfish
337
+ xx
338
+ xxx
339
+ yaoi
340
+ yellow showers
341
+ yiffy
342
+ zoophilia