profanity-filter 0.1.5 → 1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +34 -0
- data/Gemfile.lock +1 -1
- data/README.md +51 -10
- data/lib/profanity-dictionaries/en.yaml +0 -1
- data/lib/profanity-filter.rb +73 -67
- data/lib/profanity-filter/engines/composite.rb +1 -1
- data/lib/profanity-filter/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9c99a863d299ca2b41dc51afa2f50ec34bc7eb5a30c4ab252881949dd3036d03
|
4
|
+
data.tar.gz: 5de32869e8f63201ee23c82390896f943876c4ac7298bf08b9efc640c451d1dc
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0ba717fee25e40a8dbba835cb54cf351da24207aa8992e75de0d80f5254ad8d4cd99d98b3b85ba1ad4205d83b072b00d0b23749418f26a954a36e52545a78572
|
7
|
+
data.tar.gz: 1b7d396648075e6bc5173009ef69413b8b07cd5bf319333562573f1ab98cafd225591274d4873b5c23df5767fe18128f3f615c571aa703a93630f69260b01228
|
data/CHANGELOG.md
CHANGED
@@ -0,0 +1,34 @@
|
|
1
|
+
## Version 1.0
|
2
|
+
|
3
|
+
This version is not compatible with previous versions. The following are main changes and migration guide:
|
4
|
+
|
5
|
+
1. Keyword parameter `strictness` for both `profane?` and `profanity_count` is replaced by `strategies`.
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
# 'strict mode' before
|
9
|
+
pf.profane?('text', strictness: :strict)
|
10
|
+
|
11
|
+
# 'strict mode' now
|
12
|
+
pf.profane?('text', strategies: :all)
|
13
|
+
|
14
|
+
# 'tolerant mode' before
|
15
|
+
pf.profane?('text', strictness: :tolerant)
|
16
|
+
|
17
|
+
# 'tolerant mode' now
|
18
|
+
pf.profane?('text', strategies: :basic)
|
19
|
+
```
|
20
|
+
2. We can compose our own strategies:
|
21
|
+
|
22
|
+
```ruby
|
23
|
+
# the below two are exactly the same:
|
24
|
+
pf.profane?('text', strategies: [:leet, :allow_symbol, :duplicate_characters, :partial_match])
|
25
|
+
pf.profane?('text', strategies: :all)
|
26
|
+
```
|
27
|
+
3. Now the default mode has full support for partial match
|
28
|
+
|
29
|
+
```ruby
|
30
|
+
# before it passes our filter, but now it's marked as profane.
|
31
|
+
pf.profane?('youasshole')
|
32
|
+
```
|
33
|
+
|
34
|
+
That's it. Enjoy!
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -12,7 +12,7 @@ This profanity filter implements:
|
|
12
12
|
- [Full Support] diacritics, injections, unicode
|
13
13
|
- [Partial Support] similarities, constructions
|
14
14
|
|
15
|
-
This gem is also integrated with [
|
15
|
+
This gem is also integrated with [WebPurify](https://www.webpurify.com). Usage example below.
|
16
16
|
|
17
17
|
|
18
18
|
## Installation
|
@@ -20,26 +20,27 @@ This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage
|
|
20
20
|
Add this line to your application's Gemfile:
|
21
21
|
|
22
22
|
```ruby
|
23
|
-
gem 'profanity-filter'
|
23
|
+
gem 'profanity-filter', '~> 1.0'
|
24
24
|
```
|
25
25
|
|
26
26
|
And then execute:
|
27
27
|
|
28
|
-
$ bundle
|
28
|
+
$ bundle install
|
29
29
|
|
30
30
|
Or install it yourself as:
|
31
31
|
|
32
32
|
$ gem install profanity-filter
|
33
33
|
|
34
|
+
## Versioning
|
35
|
+
Version 1.0 onward is not compatible with previous versions. See [changelog(https://github.com/cardinalblue/profanity-filter/blob/master/CHANGELOG.md)] for details.
|
36
|
+
|
34
37
|
## Usage
|
38
|
+
In your Ruby code,
|
35
39
|
|
36
40
|
```ruby
|
37
|
-
#
|
41
|
+
# basic usage
|
38
42
|
pf = ProfanityFilter.new
|
39
43
|
|
40
|
-
# with WebPurify
|
41
|
-
pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
|
42
|
-
|
43
44
|
pf.profane? ('ssssshit')
|
44
45
|
# => true
|
45
46
|
|
@@ -47,6 +48,49 @@ pf.profanity_count('fjsdio fdsk fU_cK_THIS_shI_T')
|
|
47
48
|
# => 2
|
48
49
|
```
|
49
50
|
|
51
|
+
If we want to integrate WebPurify,
|
52
|
+
|
53
|
+
```ruby
|
54
|
+
# with WebPurify
|
55
|
+
pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
|
56
|
+
```
|
57
|
+
|
58
|
+
With WebPurify enabled, texts sent to `profane?` and `profanity_count` will **first** be checked against the mechanism this gem provides, **then** against WebPurify if no positive results are returned.
|
59
|
+
|
60
|
+
## Strategies
|
61
|
+
There are four different `strategies` that we can compose to our heart's content.
|
62
|
+
|
63
|
+
1. `:partial_match`
|
64
|
+
will flag a text as profane if any substrings of it is in our dictionary.
|
65
|
+
|
66
|
+
2. `:allow_symbol`
|
67
|
+
will flag a text as profane if any word in the text matches our dictionary after removing the symbols.
|
68
|
+
|
69
|
+
3. `:duplicate_characters`
|
70
|
+
will flag a text as profane if any word in the text matches our dictionary after removing duplications.
|
71
|
+
|
72
|
+
4. `:leet`
|
73
|
+
will flag a text as profane if any word in the text matches our dictionary after substituting similar unicode characters with their letter correspondents.
|
74
|
+
|
75
|
+
## Config
|
76
|
+
By default, the profanity filter implements `:partial_match` and `:allow_symbol` strategies. But we can specify what strategies we want:
|
77
|
+
|
78
|
+
```ruby
|
79
|
+
pf = ProfanityFilter.new
|
80
|
+
|
81
|
+
# type :basic is the default
|
82
|
+
pf.profane?('test_string', strategies: :basic)
|
83
|
+
pf.profanity_count('test_string', strategies: :basic)
|
84
|
+
|
85
|
+
# type :all includes all four strategies
|
86
|
+
pf.profane?('test_string', strategies: :all)
|
87
|
+
pf.profanity_count('test_string', strategies: :all)
|
88
|
+
|
89
|
+
# compose our own
|
90
|
+
pf.profane?('test_string', strategies: [:partial_match, :leet])
|
91
|
+
pf.profanity_count('test_string', strategies: [:partial_match, :leet])
|
92
|
+
```
|
93
|
+
|
50
94
|
## Development
|
51
95
|
|
52
96
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
@@ -64,6 +108,3 @@ The gem is available as open source under the terms of the [MIT License](https:/
|
|
64
108
|
## Code of Conduct
|
65
109
|
|
66
110
|
Everyone interacting in the ProfanityFilter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/cardinalblue/profanity-filter/blob/master/CODE_OF_CONDUCT.md).
|
67
|
-
|
68
|
-
## Todo
|
69
|
-
pluggable logging and strategies
|
data/lib/profanity-filter.rb
CHANGED
@@ -9,76 +9,76 @@ require 'profanity-filter/engines/leet_exact_match_strategy'
|
|
9
9
|
require 'web_purify'
|
10
10
|
|
11
11
|
class ProfanityFilter
|
12
|
-
WP_DEFAULT_LANGS
|
13
|
-
|
12
|
+
WP_DEFAULT_LANGS = [:en].freeze
|
13
|
+
WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
|
14
|
+
WP_AVAILABLE_LANGS = [
|
14
15
|
:en, :ar, :fr, :de, :hi, :jp, :it, :pt, :ru, :sp, :th, :tr, :zh, :kr, :pa
|
15
16
|
].freeze
|
16
|
-
WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
|
17
17
|
|
18
|
-
|
18
|
+
LEET_STRATEGY = :leet
|
19
|
+
ALLOW_SYMBOL_STRATEGY = :allow_symbol
|
20
|
+
PARTIAL_MATCH_STRATEGY = :partial_match
|
21
|
+
DUPLICATE_CHARACTERS_STRATEGY = :duplicate_characters
|
19
22
|
|
20
|
-
|
23
|
+
attr_reader :available_strategies
|
24
|
+
|
25
|
+
def initialize(web_purifier_api_key: nil, whitelist: [])
|
21
26
|
# If we are using Web Purifier
|
22
27
|
@wp_client = web_purifier_api_key ? WebPurify::Client.new(web_purifier_api_key) : nil
|
28
|
+
@whitelist = whitelist
|
29
|
+
raise 'Whitelist should be an array' unless @whitelist.is_a?(Array)
|
23
30
|
|
24
31
|
exact_match_dictionary = load_exact_match_dictionary
|
25
32
|
partial_match_dictionary = load_partial_match_dictionary
|
26
33
|
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
@strict_filter.add_strategies(
|
47
|
-
leet_strategy,
|
48
|
-
allow_symbol_strategy,
|
49
|
-
partial_match_strategy,
|
50
|
-
duplicate_characters_strategy
|
51
|
-
)
|
52
|
-
# Set up tolerant filter.
|
53
|
-
@tolerant_filter = ::ProfanityFilterEngine::Composite.new
|
54
|
-
@tolerant_filter.add_strategies(
|
55
|
-
allow_symbol_strategy,
|
56
|
-
partial_match_strategy
|
57
|
-
)
|
58
|
-
end
|
59
|
-
|
60
|
-
def profane?(phrase, lang: nil, strictness: :tolerant)
|
61
|
-
return false if phrase == '' || phrase.nil?
|
62
|
-
|
63
|
-
is_profane = pf_profane?(phrase, strictness: strictness)
|
64
|
-
if !is_profane && use_webpurify?
|
65
|
-
wp_is_profane = wp_profane?(phrase, lang: lang)
|
66
|
-
is_profane = wp_is_profane unless wp_is_profane.nil?
|
67
|
-
end
|
34
|
+
@available_strategies = {
|
35
|
+
ALLOW_SYMBOL_STRATEGY => ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
|
36
|
+
dictionary: exact_match_dictionary,
|
37
|
+
ignore_case: true
|
38
|
+
),
|
39
|
+
DUPLICATE_CHARACTERS_STRATEGY => ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
|
40
|
+
dictionary: exact_match_dictionary,
|
41
|
+
ignore_case: true
|
42
|
+
),
|
43
|
+
LEET_STRATEGY => ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
|
44
|
+
dictionary: exact_match_dictionary,
|
45
|
+
ignore_case: true
|
46
|
+
),
|
47
|
+
PARTIAL_MATCH_STRATEGY => ::ProfanityFilterEngine::PartialMatchStrategy.new(
|
48
|
+
dictionary: partial_match_dictionary + exact_match_dictionary,
|
49
|
+
ignore_case: true
|
50
|
+
),
|
51
|
+
}
|
52
|
+
end
|
68
53
|
|
69
|
-
|
54
|
+
def all_strategy_names
|
55
|
+
available_strategies.keys
|
70
56
|
end
|
71
57
|
|
72
|
-
def
|
73
|
-
|
58
|
+
def basic_strategy_names
|
59
|
+
[ALLOW_SYMBOL_STRATEGY, PARTIAL_MATCH_STRATEGY]
|
60
|
+
end
|
61
|
+
|
62
|
+
def profane?(phrase, lang: nil, strategies: :basic)
|
63
|
+
return false if phrase == ''
|
64
|
+
return false if @whitelist.include?(phrase)
|
74
65
|
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
66
|
+
if use_webpurify?
|
67
|
+
!!(pf_profane?(phrase, strategies: strategies) || wp_profane?(phrase, lang: lang))
|
68
|
+
else
|
69
|
+
!!pf_profane?(phrase, strategies: strategies)
|
79
70
|
end
|
71
|
+
end
|
72
|
+
|
73
|
+
def profanity_count(phrase, lang: nil, strategies: :basic)
|
74
|
+
return 0 if phrase == '' || phrase.nil?
|
80
75
|
|
81
|
-
|
76
|
+
pf_count = pf_profanity_count(phrase, strategies: strategies)
|
77
|
+
if use_webpurify?
|
78
|
+
pf_count.zero? ? wp_profanity_count(phrase, lang: lang).to_i : pf_count
|
79
|
+
else
|
80
|
+
pf_count
|
81
|
+
end
|
82
82
|
end
|
83
83
|
|
84
84
|
private
|
@@ -87,23 +87,29 @@ class ProfanityFilter
|
|
87
87
|
!!@wp_client
|
88
88
|
end
|
89
89
|
|
90
|
-
def filter(
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
90
|
+
def filter(strategies:)
|
91
|
+
::ProfanityFilterEngine::Composite.new.tap do |engine|
|
92
|
+
case strategies
|
93
|
+
when :all
|
94
|
+
all_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
|
95
|
+
when :basic
|
96
|
+
basic_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
|
97
|
+
else
|
98
|
+
strategies.each do |s|
|
99
|
+
raise "Strategy name \"#{s}\" not supported." unless all_strategy_names.include?(s)
|
100
|
+
|
101
|
+
engine.add_strategy(available_strategies[s])
|
102
|
+
end
|
103
|
+
end
|
98
104
|
end
|
99
105
|
end
|
100
106
|
|
101
|
-
def pf_profane?(phrase,
|
102
|
-
filter(
|
107
|
+
def pf_profane?(phrase, strategies:)
|
108
|
+
filter(strategies: strategies).profane?(phrase)
|
103
109
|
end
|
104
110
|
|
105
|
-
def pf_profanity_count(phrase,
|
106
|
-
filter(
|
111
|
+
def pf_profanity_count(phrase, strategies:)
|
112
|
+
filter(strategies: strategies).profanity_count(phrase)
|
107
113
|
end
|
108
114
|
|
109
115
|
def wp_profane?(phrase, lang: nil, timeout_duration: 5)
|
@@ -120,7 +126,7 @@ class ProfanityFilter
|
|
120
126
|
Timeout::timeout(timeout_duration) do
|
121
127
|
@wp_client.check_count phrase, lang: wp_langs_list_with(lang)
|
122
128
|
end
|
123
|
-
rescue StandardError
|
129
|
+
rescue StandardError
|
124
130
|
nil
|
125
131
|
end
|
126
132
|
|
@@ -29,7 +29,7 @@ module ProfanityFilterEngine
|
|
29
29
|
|
30
30
|
def profane_words(text)
|
31
31
|
total_words = strategies.reduce([]) do |words, strategy|
|
32
|
-
words.concat(strategy.profane_words(text))
|
32
|
+
words.concat(strategy.profane_words(text).map { |w| w.gsub(/[ _\-\.]/, '') })
|
33
33
|
end
|
34
34
|
total_words.uniq
|
35
35
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: profanity-filter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: '1.0'
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maso Lin
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: exe
|
12
12
|
cert_chain: []
|
13
|
-
date: 2019-12-
|
13
|
+
date: 2019-12-31 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: webpurify
|