profanity-filter 0.1.5 → 1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d0c83f7c7ca1562230fddba3b7e38f46b8148ae207270995711a2c44ac6a7ad2
4
- data.tar.gz: 0af221e57a13ed4f2e7da49f638b997167fd59396e700653b5d3d74a978479f9
3
+ metadata.gz: 9c99a863d299ca2b41dc51afa2f50ec34bc7eb5a30c4ab252881949dd3036d03
4
+ data.tar.gz: 5de32869e8f63201ee23c82390896f943876c4ac7298bf08b9efc640c451d1dc
5
5
  SHA512:
6
- metadata.gz: a4587e6c90ffbccd1261d4f035f51825dbc7a667daa395bb6f705f0187a14268377c2f69faa08f264bd9f03ba485a31ed8fa678add0ce3dc7f45eb27a4056843
7
- data.tar.gz: 3fea0ebb4adc5aeb60154a903f356c23bfffb800dd5ed410aeaaa9346a3babd4aa095c1a503fb4e2237e306ea00eb251069352b08d8c598488166093006db0f0
6
+ metadata.gz: 0ba717fee25e40a8dbba835cb54cf351da24207aa8992e75de0d80f5254ad8d4cd99d98b3b85ba1ad4205d83b072b00d0b23749418f26a954a36e52545a78572
7
+ data.tar.gz: 1b7d396648075e6bc5173009ef69413b8b07cd5bf319333562573f1ab98cafd225591274d4873b5c23df5767fe18128f3f615c571aa703a93630f69260b01228
@@ -0,0 +1,34 @@
1
+ ## Version 1.0
2
+
3
+ This version is not compatible with previous versions. The following are main changes and migration guide:
4
+
5
+ 1. Keyword parameter `strictness` for both `profane?` and `profanity_count` is replaced by `strategies`.
6
+
7
+ ```ruby
8
+ # 'strict mode' before
9
+ pf.profane?('text', strictness: :strict)
10
+
11
+ # 'strict mode' now
12
+ pf.profane?('text', strategies: :all)
13
+
14
+ # 'tolerant mode' before
15
+ pf.profane?('text', strictness: :tolerant)
16
+
17
+ # 'tolerant mode' now
18
+ pf.profane?('text', strategies: :basic)
19
+ ```
20
+ 2. We can compose our own strategies:
21
+
22
+ ```ruby
23
+ # the below two are exactly the same:
24
+ pf.profane?('text', strategies: [:leet, :allow_symbol, :duplicate_characters, :partial_match])
25
+ pf.profane?('text', strategies: :all)
26
+ ```
27
+ 3. Now the default mode has full support for partial match
28
+
29
+ ```ruby
30
+ # before it passes our filter, but now it's marked as profane.
31
+ pf.profane?('youasshole')
32
+ ```
33
+
34
+ That's it. Enjoy!
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- profanity-filter (0.1.4)
4
+ profanity-filter (1.0)
5
5
  webpurify
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -12,7 +12,7 @@ This profanity filter implements:
12
12
  - [Full Support] diacritics, injections, unicode
13
13
  - [Partial Support] similarities, constructions
14
14
 
15
- This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage example below.
15
+ This gem is also integrated with [WebPurify](https://www.webpurify.com). Usage example below.
16
16
 
17
17
 
18
18
  ## Installation
@@ -20,26 +20,27 @@ This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage
20
20
  Add this line to your application's Gemfile:
21
21
 
22
22
  ```ruby
23
- gem 'profanity-filter'
23
+ gem 'profanity-filter', '~> 1.0'
24
24
  ```
25
25
 
26
26
  And then execute:
27
27
 
28
- $ bundle
28
+ $ bundle install
29
29
 
30
30
  Or install it yourself as:
31
31
 
32
32
  $ gem install profanity-filter
33
33
 
34
+ ## Versioning
35
+ Version 1.0 onward is not compatible with previous versions. See [changelog(https://github.com/cardinalblue/profanity-filter/blob/master/CHANGELOG.md)] for details.
36
+
34
37
  ## Usage
38
+ In your Ruby code,
35
39
 
36
40
  ```ruby
37
- # without WebPurify
41
+ # basic usage
38
42
  pf = ProfanityFilter.new
39
43
 
40
- # with WebPurify
41
- pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
42
-
43
44
  pf.profane? ('ssssshit')
44
45
  # => true
45
46
 
@@ -47,6 +48,49 @@ pf.profanity_count('fjsdio fdsk fU_cK_THIS_shI_T')
47
48
  # => 2
48
49
  ```
49
50
 
51
+ If we want to integrate WebPurify,
52
+
53
+ ```ruby
54
+ # with WebPurify
55
+ pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
56
+ ```
57
+
58
+ With WebPurify enabled, texts sent to `profane?` and `profanity_count` will **first** be checked against the mechanism this gem provides, **then** against WebPurify if no positive results are returned.
59
+
60
+ ## Strategies
61
+ There are four different `strategies` that we can compose to our heart's content.
62
+
63
+ 1. `:partial_match`
64
+ will flag a text as profane if any substrings of it is in our dictionary.
65
+
66
+ 2. `:allow_symbol`
67
+ will flag a text as profane if any word in the text matches our dictionary after removing the symbols.
68
+
69
+ 3. `:duplicate_characters`
70
+ will flag a text as profane if any word in the text matches our dictionary after removing duplications.
71
+
72
+ 4. `:leet`
73
+ will flag a text as profane if any word in the text matches our dictionary after substituting similar unicode characters with their letter correspondents.
74
+
75
+ ## Config
76
+ By default, the profanity filter implements `:partial_match` and `:allow_symbol` strategies. But we can specify what strategies we want:
77
+
78
+ ```ruby
79
+ pf = ProfanityFilter.new
80
+
81
+ # type :basic is the default
82
+ pf.profane?('test_string', strategies: :basic)
83
+ pf.profanity_count('test_string', strategies: :basic)
84
+
85
+ # type :all includes all four strategies
86
+ pf.profane?('test_string', strategies: :all)
87
+ pf.profanity_count('test_string', strategies: :all)
88
+
89
+ # compose our own
90
+ pf.profane?('test_string', strategies: [:partial_match, :leet])
91
+ pf.profanity_count('test_string', strategies: [:partial_match, :leet])
92
+ ```
93
+
50
94
  ## Development
51
95
 
52
96
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -64,6 +108,3 @@ The gem is available as open source under the terms of the [MIT License](https:/
64
108
  ## Code of Conduct
65
109
 
66
110
  Everyone interacting in the ProfanityFilter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/cardinalblue/profanity-filter/blob/master/CODE_OF_CONDUCT.md).
67
-
68
- ## Todo
69
- pluggable logging and strategies
@@ -17,7 +17,6 @@
17
17
  - bitching
18
18
  - blowjob
19
19
  - blowjobs
20
- - bullshit
21
20
  - clit
22
21
  - cocksuck
23
22
  - cocksucked
@@ -9,76 +9,76 @@ require 'profanity-filter/engines/leet_exact_match_strategy'
9
9
  require 'web_purify'
10
10
 
11
11
  class ProfanityFilter
12
- WP_DEFAULT_LANGS = [:en].freeze
13
- WP_AVAILABLE_LANGS = [
12
+ WP_DEFAULT_LANGS = [:en].freeze
13
+ WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
14
+ WP_AVAILABLE_LANGS = [
14
15
  :en, :ar, :fr, :de, :hi, :jp, :it, :pt, :ru, :sp, :th, :tr, :zh, :kr, :pa
15
16
  ].freeze
16
- WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
17
17
 
18
- attr_reader :strict_filter, :tolerant_filter
18
+ LEET_STRATEGY = :leet
19
+ ALLOW_SYMBOL_STRATEGY = :allow_symbol
20
+ PARTIAL_MATCH_STRATEGY = :partial_match
21
+ DUPLICATE_CHARACTERS_STRATEGY = :duplicate_characters
19
22
 
20
- def initialize(web_purifier_api_key: nil)
23
+ attr_reader :available_strategies
24
+
25
+ def initialize(web_purifier_api_key: nil, whitelist: [])
21
26
  # If we are using Web Purifier
22
27
  @wp_client = web_purifier_api_key ? WebPurify::Client.new(web_purifier_api_key) : nil
28
+ @whitelist = whitelist
29
+ raise 'Whitelist should be an array' unless @whitelist.is_a?(Array)
23
30
 
24
31
  exact_match_dictionary = load_exact_match_dictionary
25
32
  partial_match_dictionary = load_partial_match_dictionary
26
33
 
27
- allow_symbol_strategy = ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
28
- dictionary: exact_match_dictionary,
29
- ignore_case: true
30
- )
31
- duplicate_characters_strategy = ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
32
- dictionary: exact_match_dictionary,
33
- ignore_case: true
34
- )
35
- leet_strategy = ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
36
- dictionary: exact_match_dictionary,
37
- ignore_case: true
38
- )
39
- partial_match_strategy = ::ProfanityFilterEngine::PartialMatchStrategy.new(
40
- dictionary: partial_match_dictionary,
41
- ignore_case: true
42
- )
43
-
44
- # Set up strict filter.
45
- @strict_filter = ::ProfanityFilterEngine::Composite.new
46
- @strict_filter.add_strategies(
47
- leet_strategy,
48
- allow_symbol_strategy,
49
- partial_match_strategy,
50
- duplicate_characters_strategy
51
- )
52
- # Set up tolerant filter.
53
- @tolerant_filter = ::ProfanityFilterEngine::Composite.new
54
- @tolerant_filter.add_strategies(
55
- allow_symbol_strategy,
56
- partial_match_strategy
57
- )
58
- end
59
-
60
- def profane?(phrase, lang: nil, strictness: :tolerant)
61
- return false if phrase == '' || phrase.nil?
62
-
63
- is_profane = pf_profane?(phrase, strictness: strictness)
64
- if !is_profane && use_webpurify?
65
- wp_is_profane = wp_profane?(phrase, lang: lang)
66
- is_profane = wp_is_profane unless wp_is_profane.nil?
67
- end
34
+ @available_strategies = {
35
+ ALLOW_SYMBOL_STRATEGY => ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
36
+ dictionary: exact_match_dictionary,
37
+ ignore_case: true
38
+ ),
39
+ DUPLICATE_CHARACTERS_STRATEGY => ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
40
+ dictionary: exact_match_dictionary,
41
+ ignore_case: true
42
+ ),
43
+ LEET_STRATEGY => ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
44
+ dictionary: exact_match_dictionary,
45
+ ignore_case: true
46
+ ),
47
+ PARTIAL_MATCH_STRATEGY => ::ProfanityFilterEngine::PartialMatchStrategy.new(
48
+ dictionary: partial_match_dictionary + exact_match_dictionary,
49
+ ignore_case: true
50
+ ),
51
+ }
52
+ end
68
53
 
69
- !!is_profane
54
+ def all_strategy_names
55
+ available_strategies.keys
70
56
  end
71
57
 
72
- def profanity_count(phrase, lang: nil, strictness: :tolerant)
73
- return 0 if phrase == '' || phrase.nil?
58
+ def basic_strategy_names
59
+ [ALLOW_SYMBOL_STRATEGY, PARTIAL_MATCH_STRATEGY]
60
+ end
61
+
62
+ def profane?(phrase, lang: nil, strategies: :basic)
63
+ return false if phrase == ''
64
+ return false if @whitelist.include?(phrase)
74
65
 
75
- banned_words_count = pf_profanity_count(phrase, strictness: strictness)
76
- if banned_words_count == 0 && use_webpurify?
77
- wp_banned_words_count = wp_profanity_count(phrase, lang: lang)
78
- banned_words_count = wp_banned_words_count unless wp_banned_words_count.nil?
66
+ if use_webpurify?
67
+ !!(pf_profane?(phrase, strategies: strategies) || wp_profane?(phrase, lang: lang))
68
+ else
69
+ !!pf_profane?(phrase, strategies: strategies)
79
70
  end
71
+ end
72
+
73
+ def profanity_count(phrase, lang: nil, strategies: :basic)
74
+ return 0 if phrase == '' || phrase.nil?
80
75
 
81
- banned_words_count
76
+ pf_count = pf_profanity_count(phrase, strategies: strategies)
77
+ if use_webpurify?
78
+ pf_count.zero? ? wp_profanity_count(phrase, lang: lang).to_i : pf_count
79
+ else
80
+ pf_count
81
+ end
82
82
  end
83
83
 
84
84
  private
@@ -87,23 +87,29 @@ class ProfanityFilter
87
87
  !!@wp_client
88
88
  end
89
89
 
90
- def filter(strictness: :tolerant)
91
- case strictness
92
- when :strict
93
- @strict_filter
94
- when :tolerant
95
- @tolerant_filter
96
- else
97
- @tolerant_filter
90
+ def filter(strategies:)
91
+ ::ProfanityFilterEngine::Composite.new.tap do |engine|
92
+ case strategies
93
+ when :all
94
+ all_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
95
+ when :basic
96
+ basic_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
97
+ else
98
+ strategies.each do |s|
99
+ raise "Strategy name \"#{s}\" not supported." unless all_strategy_names.include?(s)
100
+
101
+ engine.add_strategy(available_strategies[s])
102
+ end
103
+ end
98
104
  end
99
105
  end
100
106
 
101
- def pf_profane?(phrase, strictness: :tolerant)
102
- filter(strictness: strictness).profane?(phrase)
107
+ def pf_profane?(phrase, strategies:)
108
+ filter(strategies: strategies).profane?(phrase)
103
109
  end
104
110
 
105
- def pf_profanity_count(phrase, strictness: :tolerant)
106
- filter(strictness: strictness).profanity_count(phrase)
111
+ def pf_profanity_count(phrase, strategies:)
112
+ filter(strategies: strategies).profanity_count(phrase)
107
113
  end
108
114
 
109
115
  def wp_profane?(phrase, lang: nil, timeout_duration: 5)
@@ -120,7 +126,7 @@ class ProfanityFilter
120
126
  Timeout::timeout(timeout_duration) do
121
127
  @wp_client.check_count phrase, lang: wp_langs_list_with(lang)
122
128
  end
123
- rescue StandardError => e
129
+ rescue StandardError
124
130
  nil
125
131
  end
126
132
 
@@ -29,7 +29,7 @@ module ProfanityFilterEngine
29
29
 
30
30
  def profane_words(text)
31
31
  total_words = strategies.reduce([]) do |words, strategy|
32
- words.concat(strategy.profane_words(text))
32
+ words.concat(strategy.profane_words(text).map { |w| w.gsub(/[ _\-\.]/, '') })
33
33
  end
34
34
  total_words.uniq
35
35
  end
@@ -1,3 +1,3 @@
1
1
  class ProfanityFilter
2
- VERSION = '0.1.5'
2
+ VERSION = '1.0'
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: profanity-filter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
4
+ version: '1.0'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maso Lin
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: exe
12
12
  cert_chain: []
13
- date: 2019-12-17 00:00:00.000000000 Z
13
+ date: 2019-12-31 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: webpurify