profanity-filter 0.1.5 → 1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +34 -0
- data/Gemfile.lock +1 -1
- data/README.md +51 -10
- data/lib/profanity-dictionaries/en.yaml +0 -1
- data/lib/profanity-filter.rb +73 -67
- data/lib/profanity-filter/engines/composite.rb +1 -1
- data/lib/profanity-filter/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9c99a863d299ca2b41dc51afa2f50ec34bc7eb5a30c4ab252881949dd3036d03
|
4
|
+
data.tar.gz: 5de32869e8f63201ee23c82390896f943876c4ac7298bf08b9efc640c451d1dc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0ba717fee25e40a8dbba835cb54cf351da24207aa8992e75de0d80f5254ad8d4cd99d98b3b85ba1ad4205d83b072b00d0b23749418f26a954a36e52545a78572
|
7
|
+
data.tar.gz: 1b7d396648075e6bc5173009ef69413b8b07cd5bf319333562573f1ab98cafd225591274d4873b5c23df5767fe18128f3f615c571aa703a93630f69260b01228
|
data/CHANGELOG.md
CHANGED
@@ -0,0 +1,34 @@
|
|
1
|
+
## Version 1.0
|
2
|
+
|
3
|
+
This version is not compatible with previous versions. The following are main changes and migration guide:
|
4
|
+
|
5
|
+
1. Keyword parameter `strictness` for both `profane?` and `profanity_count` is replaced by `strategies`.
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
# 'strict mode' before
|
9
|
+
pf.profane?('text', strictness: :strict)
|
10
|
+
|
11
|
+
# 'strict mode' now
|
12
|
+
pf.profane?('text', strategies: :all)
|
13
|
+
|
14
|
+
# 'tolerant mode' before
|
15
|
+
pf.profane?('text', strictness: :tolerant)
|
16
|
+
|
17
|
+
# 'tolerant mode' now
|
18
|
+
pf.profane?('text', strategies: :basic)
|
19
|
+
```
|
20
|
+
2. We can compose our own strategies:
|
21
|
+
|
22
|
+
```ruby
|
23
|
+
# the below two are exactly the same:
|
24
|
+
pf.profane?('text', strategies: [:leet, :allow_symbol, :duplicate_characters, :partial_match])
|
25
|
+
pf.profane?('text', strategies: :all)
|
26
|
+
```
|
27
|
+
3. Now the default mode has full support for partial match
|
28
|
+
|
29
|
+
```ruby
|
30
|
+
# before it passes our filter, but now it's marked as profane.
|
31
|
+
pf.profane?('youasshole')
|
32
|
+
```
|
33
|
+
|
34
|
+
That's it. Enjoy!
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -12,7 +12,7 @@ This profanity filter implements:
|
|
12
12
|
- [Full Support] diacritics, injections, unicode
|
13
13
|
- [Partial Support] similarities, constructions
|
14
14
|
|
15
|
-
This gem is also integrated with [
|
15
|
+
This gem is also integrated with [WebPurify](https://www.webpurify.com). Usage example below.
|
16
16
|
|
17
17
|
|
18
18
|
## Installation
|
@@ -20,26 +20,27 @@ This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage
|
|
20
20
|
Add this line to your application's Gemfile:
|
21
21
|
|
22
22
|
```ruby
|
23
|
-
gem 'profanity-filter'
|
23
|
+
gem 'profanity-filter', '~> 1.0'
|
24
24
|
```
|
25
25
|
|
26
26
|
And then execute:
|
27
27
|
|
28
|
-
$ bundle
|
28
|
+
$ bundle install
|
29
29
|
|
30
30
|
Or install it yourself as:
|
31
31
|
|
32
32
|
$ gem install profanity-filter
|
33
33
|
|
34
|
+
## Versioning
|
35
|
+
Version 1.0 onward is not compatible with previous versions. See [changelog(https://github.com/cardinalblue/profanity-filter/blob/master/CHANGELOG.md)] for details.
|
36
|
+
|
34
37
|
## Usage
|
38
|
+
In your Ruby code,
|
35
39
|
|
36
40
|
```ruby
|
37
|
-
#
|
41
|
+
# basic usage
|
38
42
|
pf = ProfanityFilter.new
|
39
43
|
|
40
|
-
# with WebPurify
|
41
|
-
pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
|
42
|
-
|
43
44
|
pf.profane? ('ssssshit')
|
44
45
|
# => true
|
45
46
|
|
@@ -47,6 +48,49 @@ pf.profanity_count('fjsdio fdsk fU_cK_THIS_shI_T')
|
|
47
48
|
# => 2
|
48
49
|
```
|
49
50
|
|
51
|
+
If we want to integrate WebPurify,
|
52
|
+
|
53
|
+
```ruby
|
54
|
+
# with WebPurify
|
55
|
+
pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
|
56
|
+
```
|
57
|
+
|
58
|
+
With WebPurify enabled, texts sent to `profane?` and `profanity_count` will **first** be checked against the mechanism this gem provides, **then** against WebPurify if no positive results are returned.
|
59
|
+
|
60
|
+
## Strategies
|
61
|
+
There are four different `strategies` that we can compose to our heart's content.
|
62
|
+
|
63
|
+
1. `:partial_match`
|
64
|
+
will flag a text as profane if any substrings of it is in our dictionary.
|
65
|
+
|
66
|
+
2. `:allow_symbol`
|
67
|
+
will flag a text as profane if any word in the text matches our dictionary after removing the symbols.
|
68
|
+
|
69
|
+
3. `:duplicate_characters`
|
70
|
+
will flag a text as profane if any word in the text matches our dictionary after removing duplications.
|
71
|
+
|
72
|
+
4. `:leet`
|
73
|
+
will flag a text as profane if any word in the text matches our dictionary after substituting similar unicode characters with their letter correspondents.
|
74
|
+
|
75
|
+
## Config
|
76
|
+
By default, the profanity filter implements `:partial_match` and `:allow_symbol` strategies. But we can specify what strategies we want:
|
77
|
+
|
78
|
+
```ruby
|
79
|
+
pf = ProfanityFilter.new
|
80
|
+
|
81
|
+
# type :basic is the default
|
82
|
+
pf.profane?('test_string', strategies: :basic)
|
83
|
+
pf.profanity_count('test_string', strategies: :basic)
|
84
|
+
|
85
|
+
# type :all includes all four strategies
|
86
|
+
pf.profane?('test_string', strategies: :all)
|
87
|
+
pf.profanity_count('test_string', strategies: :all)
|
88
|
+
|
89
|
+
# compose our own
|
90
|
+
pf.profane?('test_string', strategies: [:partial_match, :leet])
|
91
|
+
pf.profanity_count('test_string', strategies: [:partial_match, :leet])
|
92
|
+
```
|
93
|
+
|
50
94
|
## Development
|
51
95
|
|
52
96
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
@@ -64,6 +108,3 @@ The gem is available as open source under the terms of the [MIT License](https:/
|
|
64
108
|
## Code of Conduct
|
65
109
|
|
66
110
|
Everyone interacting in the ProfanityFilter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/cardinalblue/profanity-filter/blob/master/CODE_OF_CONDUCT.md).
|
67
|
-
|
68
|
-
## Todo
|
69
|
-
pluggable logging and strategies
|
data/lib/profanity-filter.rb
CHANGED
@@ -9,76 +9,76 @@ require 'profanity-filter/engines/leet_exact_match_strategy'
|
|
9
9
|
require 'web_purify'
|
10
10
|
|
11
11
|
class ProfanityFilter
|
12
|
-
WP_DEFAULT_LANGS
|
13
|
-
|
12
|
+
WP_DEFAULT_LANGS = [:en].freeze
|
13
|
+
WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
|
14
|
+
WP_AVAILABLE_LANGS = [
|
14
15
|
:en, :ar, :fr, :de, :hi, :jp, :it, :pt, :ru, :sp, :th, :tr, :zh, :kr, :pa
|
15
16
|
].freeze
|
16
|
-
WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
|
17
17
|
|
18
|
-
|
18
|
+
LEET_STRATEGY = :leet
|
19
|
+
ALLOW_SYMBOL_STRATEGY = :allow_symbol
|
20
|
+
PARTIAL_MATCH_STRATEGY = :partial_match
|
21
|
+
DUPLICATE_CHARACTERS_STRATEGY = :duplicate_characters
|
19
22
|
|
20
|
-
|
23
|
+
attr_reader :available_strategies
|
24
|
+
|
25
|
+
def initialize(web_purifier_api_key: nil, whitelist: [])
|
21
26
|
# If we are using Web Purifier
|
22
27
|
@wp_client = web_purifier_api_key ? WebPurify::Client.new(web_purifier_api_key) : nil
|
28
|
+
@whitelist = whitelist
|
29
|
+
raise 'Whitelist should be an array' unless @whitelist.is_a?(Array)
|
23
30
|
|
24
31
|
exact_match_dictionary = load_exact_match_dictionary
|
25
32
|
partial_match_dictionary = load_partial_match_dictionary
|
26
33
|
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
@strict_filter.add_strategies(
|
47
|
-
leet_strategy,
|
48
|
-
allow_symbol_strategy,
|
49
|
-
partial_match_strategy,
|
50
|
-
duplicate_characters_strategy
|
51
|
-
)
|
52
|
-
# Set up tolerant filter.
|
53
|
-
@tolerant_filter = ::ProfanityFilterEngine::Composite.new
|
54
|
-
@tolerant_filter.add_strategies(
|
55
|
-
allow_symbol_strategy,
|
56
|
-
partial_match_strategy
|
57
|
-
)
|
58
|
-
end
|
59
|
-
|
60
|
-
def profane?(phrase, lang: nil, strictness: :tolerant)
|
61
|
-
return false if phrase == '' || phrase.nil?
|
62
|
-
|
63
|
-
is_profane = pf_profane?(phrase, strictness: strictness)
|
64
|
-
if !is_profane && use_webpurify?
|
65
|
-
wp_is_profane = wp_profane?(phrase, lang: lang)
|
66
|
-
is_profane = wp_is_profane unless wp_is_profane.nil?
|
67
|
-
end
|
34
|
+
@available_strategies = {
|
35
|
+
ALLOW_SYMBOL_STRATEGY => ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
|
36
|
+
dictionary: exact_match_dictionary,
|
37
|
+
ignore_case: true
|
38
|
+
),
|
39
|
+
DUPLICATE_CHARACTERS_STRATEGY => ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
|
40
|
+
dictionary: exact_match_dictionary,
|
41
|
+
ignore_case: true
|
42
|
+
),
|
43
|
+
LEET_STRATEGY => ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
|
44
|
+
dictionary: exact_match_dictionary,
|
45
|
+
ignore_case: true
|
46
|
+
),
|
47
|
+
PARTIAL_MATCH_STRATEGY => ::ProfanityFilterEngine::PartialMatchStrategy.new(
|
48
|
+
dictionary: partial_match_dictionary + exact_match_dictionary,
|
49
|
+
ignore_case: true
|
50
|
+
),
|
51
|
+
}
|
52
|
+
end
|
68
53
|
|
69
|
-
|
54
|
+
def all_strategy_names
|
55
|
+
available_strategies.keys
|
70
56
|
end
|
71
57
|
|
72
|
-
def
|
73
|
-
|
58
|
+
def basic_strategy_names
|
59
|
+
[ALLOW_SYMBOL_STRATEGY, PARTIAL_MATCH_STRATEGY]
|
60
|
+
end
|
61
|
+
|
62
|
+
def profane?(phrase, lang: nil, strategies: :basic)
|
63
|
+
return false if phrase == ''
|
64
|
+
return false if @whitelist.include?(phrase)
|
74
65
|
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
66
|
+
if use_webpurify?
|
67
|
+
!!(pf_profane?(phrase, strategies: strategies) || wp_profane?(phrase, lang: lang))
|
68
|
+
else
|
69
|
+
!!pf_profane?(phrase, strategies: strategies)
|
79
70
|
end
|
71
|
+
end
|
72
|
+
|
73
|
+
def profanity_count(phrase, lang: nil, strategies: :basic)
|
74
|
+
return 0 if phrase == '' || phrase.nil?
|
80
75
|
|
81
|
-
|
76
|
+
pf_count = pf_profanity_count(phrase, strategies: strategies)
|
77
|
+
if use_webpurify?
|
78
|
+
pf_count.zero? ? wp_profanity_count(phrase, lang: lang).to_i : pf_count
|
79
|
+
else
|
80
|
+
pf_count
|
81
|
+
end
|
82
82
|
end
|
83
83
|
|
84
84
|
private
|
@@ -87,23 +87,29 @@ class ProfanityFilter
|
|
87
87
|
!!@wp_client
|
88
88
|
end
|
89
89
|
|
90
|
-
def filter(
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
90
|
+
def filter(strategies:)
|
91
|
+
::ProfanityFilterEngine::Composite.new.tap do |engine|
|
92
|
+
case strategies
|
93
|
+
when :all
|
94
|
+
all_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
|
95
|
+
when :basic
|
96
|
+
basic_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
|
97
|
+
else
|
98
|
+
strategies.each do |s|
|
99
|
+
raise "Strategy name \"#{s}\" not supported." unless all_strategy_names.include?(s)
|
100
|
+
|
101
|
+
engine.add_strategy(available_strategies[s])
|
102
|
+
end
|
103
|
+
end
|
98
104
|
end
|
99
105
|
end
|
100
106
|
|
101
|
-
def pf_profane?(phrase,
|
102
|
-
filter(
|
107
|
+
def pf_profane?(phrase, strategies:)
|
108
|
+
filter(strategies: strategies).profane?(phrase)
|
103
109
|
end
|
104
110
|
|
105
|
-
def pf_profanity_count(phrase,
|
106
|
-
filter(
|
111
|
+
def pf_profanity_count(phrase, strategies:)
|
112
|
+
filter(strategies: strategies).profanity_count(phrase)
|
107
113
|
end
|
108
114
|
|
109
115
|
def wp_profane?(phrase, lang: nil, timeout_duration: 5)
|
@@ -120,7 +126,7 @@ class ProfanityFilter
|
|
120
126
|
Timeout::timeout(timeout_duration) do
|
121
127
|
@wp_client.check_count phrase, lang: wp_langs_list_with(lang)
|
122
128
|
end
|
123
|
-
rescue StandardError
|
129
|
+
rescue StandardError
|
124
130
|
nil
|
125
131
|
end
|
126
132
|
|
@@ -29,7 +29,7 @@ module ProfanityFilterEngine
|
|
29
29
|
|
30
30
|
def profane_words(text)
|
31
31
|
total_words = strategies.reduce([]) do |words, strategy|
|
32
|
-
words.concat(strategy.profane_words(text))
|
32
|
+
words.concat(strategy.profane_words(text).map { |w| w.gsub(/[ _\-\.]/, '') })
|
33
33
|
end
|
34
34
|
total_words.uniq
|
35
35
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: profanity-filter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: '1.0'
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maso Lin
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: exe
|
12
12
|
cert_chain: []
|
13
|
-
date: 2019-12-
|
13
|
+
date: 2019-12-31 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: webpurify
|