profanity-filter 0.1.5 → 1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d0c83f7c7ca1562230fddba3b7e38f46b8148ae207270995711a2c44ac6a7ad2
4
- data.tar.gz: 0af221e57a13ed4f2e7da49f638b997167fd59396e700653b5d3d74a978479f9
3
+ metadata.gz: 9c99a863d299ca2b41dc51afa2f50ec34bc7eb5a30c4ab252881949dd3036d03
4
+ data.tar.gz: 5de32869e8f63201ee23c82390896f943876c4ac7298bf08b9efc640c451d1dc
5
5
  SHA512:
6
- metadata.gz: a4587e6c90ffbccd1261d4f035f51825dbc7a667daa395bb6f705f0187a14268377c2f69faa08f264bd9f03ba485a31ed8fa678add0ce3dc7f45eb27a4056843
7
- data.tar.gz: 3fea0ebb4adc5aeb60154a903f356c23bfffb800dd5ed410aeaaa9346a3babd4aa095c1a503fb4e2237e306ea00eb251069352b08d8c598488166093006db0f0
6
+ metadata.gz: 0ba717fee25e40a8dbba835cb54cf351da24207aa8992e75de0d80f5254ad8d4cd99d98b3b85ba1ad4205d83b072b00d0b23749418f26a954a36e52545a78572
7
+ data.tar.gz: 1b7d396648075e6bc5173009ef69413b8b07cd5bf319333562573f1ab98cafd225591274d4873b5c23df5767fe18128f3f615c571aa703a93630f69260b01228
@@ -0,0 +1,34 @@
1
+ ## Version 1.0
2
+
3
+ This version is not compatible with previous versions. The following are main changes and migration guide:
4
+
5
+ 1. Keyword parameter `strictness` for both `profane?` and `profanity_count` is replaced by `strategies`.
6
+
7
+ ```ruby
8
+ # 'strict mode' before
9
+ pf.profane?('text', strictness: :strict)
10
+
11
+ # 'strict mode' now
12
+ pf.profane?('text', strategies: :all)
13
+
14
+ # 'tolerant mode' before
15
+ pf.profane?('text', strictness: :tolerant)
16
+
17
+ # 'tolerant mode' now
18
+ pf.profane?('text', strategies: :basic)
19
+ ```
20
+ 2. We can compose our own strategies:
21
+
22
+ ```ruby
23
+ # the below two are exactly the same:
24
+ pf.profane?('text', strategies: [:leet, :allow_symbol, :duplicate_characters, :partial_match])
25
+ pf.profane?('text', strategies: :all)
26
+ ```
27
+ 3. Now the default mode has full support for partial match
28
+
29
+ ```ruby
30
+ # before it passes our filter, but now it's marked as profane.
31
+ pf.profane?('youasshole')
32
+ ```
33
+
34
+ That's it. Enjoy!
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- profanity-filter (0.1.4)
4
+ profanity-filter (1.0)
5
5
  webpurify
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -12,7 +12,7 @@ This profanity filter implements:
12
12
  - [Full Support] diacritics, injections, unicode
13
13
  - [Partial Support] similarities, constructions
14
14
 
15
- This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage example below.
15
+ This gem is also integrated with [WebPurify](https://www.webpurify.com). Usage example below.
16
16
 
17
17
 
18
18
  ## Installation
@@ -20,26 +20,27 @@ This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage
20
20
  Add this line to your application's Gemfile:
21
21
 
22
22
  ```ruby
23
- gem 'profanity-filter'
23
+ gem 'profanity-filter', '~> 1.0'
24
24
  ```
25
25
 
26
26
  And then execute:
27
27
 
28
- $ bundle
28
+ $ bundle install
29
29
 
30
30
  Or install it yourself as:
31
31
 
32
32
  $ gem install profanity-filter
33
33
 
34
+ ## Versioning
35
+ Version 1.0 onward is not compatible with previous versions. See [changelog(https://github.com/cardinalblue/profanity-filter/blob/master/CHANGELOG.md)] for details.
36
+
34
37
  ## Usage
38
+ In your Ruby code,
35
39
 
36
40
  ```ruby
37
- # without WebPurify
41
+ # basic usage
38
42
  pf = ProfanityFilter.new
39
43
 
40
- # with WebPurify
41
- pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
42
-
43
44
  pf.profane? ('ssssshit')
44
45
  # => true
45
46
 
@@ -47,6 +48,49 @@ pf.profanity_count('fjsdio fdsk fU_cK_THIS_shI_T')
47
48
  # => 2
48
49
  ```
49
50
 
51
+ If we want to integrate WebPurify,
52
+
53
+ ```ruby
54
+ # with WebPurify
55
+ pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
56
+ ```
57
+
58
+ With WebPurify enabled, texts sent to `profane?` and `profanity_count` will **first** be checked against the mechanism this gem provides, **then** against WebPurify if no positive results are returned.
59
+
60
+ ## Strategies
61
+ There are four different `strategies` that we can compose to our heart's content.
62
+
63
+ 1. `:partial_match`
64
+ will flag a text as profane if any substrings of it is in our dictionary.
65
+
66
+ 2. `:allow_symbol`
67
+ will flag a text as profane if any word in the text matches our dictionary after removing the symbols.
68
+
69
+ 3. `:duplicate_characters`
70
+ will flag a text as profane if any word in the text matches our dictionary after removing duplications.
71
+
72
+ 4. `:leet`
73
+ will flag a text as profane if any word in the text matches our dictionary after substituting similar unicode characters with their letter correspondents.
74
+
75
+ ## Config
76
+ By default, the profanity filter implements `:partial_match` and `:allow_symbol` strategies. But we can specify what strategies we want:
77
+
78
+ ```ruby
79
+ pf = ProfanityFilter.new
80
+
81
+ # type :basic is the default
82
+ pf.profane?('test_string', strategies: :basic)
83
+ pf.profanity_count('test_string', strategies: :basic)
84
+
85
+ # type :all includes all four strategies
86
+ pf.profane?('test_string', strategies: :all)
87
+ pf.profanity_count('test_string', strategies: :all)
88
+
89
+ # compose our own
90
+ pf.profane?('test_string', strategies: [:partial_match, :leet])
91
+ pf.profanity_count('test_string', strategies: [:partial_match, :leet])
92
+ ```
93
+
50
94
  ## Development
51
95
 
52
96
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -64,6 +108,3 @@ The gem is available as open source under the terms of the [MIT License](https:/
64
108
  ## Code of Conduct
65
109
 
66
110
  Everyone interacting in the ProfanityFilter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/cardinalblue/profanity-filter/blob/master/CODE_OF_CONDUCT.md).
67
-
68
- ## Todo
69
- pluggable logging and strategies
@@ -17,7 +17,6 @@
17
17
  - bitching
18
18
  - blowjob
19
19
  - blowjobs
20
- - bullshit
21
20
  - clit
22
21
  - cocksuck
23
22
  - cocksucked
@@ -9,76 +9,76 @@ require 'profanity-filter/engines/leet_exact_match_strategy'
9
9
  require 'web_purify'
10
10
 
11
11
  class ProfanityFilter
12
- WP_DEFAULT_LANGS = [:en].freeze
13
- WP_AVAILABLE_LANGS = [
12
+ WP_DEFAULT_LANGS = [:en].freeze
13
+ WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
14
+ WP_AVAILABLE_LANGS = [
14
15
  :en, :ar, :fr, :de, :hi, :jp, :it, :pt, :ru, :sp, :th, :tr, :zh, :kr, :pa
15
16
  ].freeze
16
- WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
17
17
 
18
- attr_reader :strict_filter, :tolerant_filter
18
+ LEET_STRATEGY = :leet
19
+ ALLOW_SYMBOL_STRATEGY = :allow_symbol
20
+ PARTIAL_MATCH_STRATEGY = :partial_match
21
+ DUPLICATE_CHARACTERS_STRATEGY = :duplicate_characters
19
22
 
20
- def initialize(web_purifier_api_key: nil)
23
+ attr_reader :available_strategies
24
+
25
+ def initialize(web_purifier_api_key: nil, whitelist: [])
21
26
  # If we are using Web Purifier
22
27
  @wp_client = web_purifier_api_key ? WebPurify::Client.new(web_purifier_api_key) : nil
28
+ @whitelist = whitelist
29
+ raise 'Whitelist should be an array' unless @whitelist.is_a?(Array)
23
30
 
24
31
  exact_match_dictionary = load_exact_match_dictionary
25
32
  partial_match_dictionary = load_partial_match_dictionary
26
33
 
27
- allow_symbol_strategy = ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
28
- dictionary: exact_match_dictionary,
29
- ignore_case: true
30
- )
31
- duplicate_characters_strategy = ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
32
- dictionary: exact_match_dictionary,
33
- ignore_case: true
34
- )
35
- leet_strategy = ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
36
- dictionary: exact_match_dictionary,
37
- ignore_case: true
38
- )
39
- partial_match_strategy = ::ProfanityFilterEngine::PartialMatchStrategy.new(
40
- dictionary: partial_match_dictionary,
41
- ignore_case: true
42
- )
43
-
44
- # Set up strict filter.
45
- @strict_filter = ::ProfanityFilterEngine::Composite.new
46
- @strict_filter.add_strategies(
47
- leet_strategy,
48
- allow_symbol_strategy,
49
- partial_match_strategy,
50
- duplicate_characters_strategy
51
- )
52
- # Set up tolerant filter.
53
- @tolerant_filter = ::ProfanityFilterEngine::Composite.new
54
- @tolerant_filter.add_strategies(
55
- allow_symbol_strategy,
56
- partial_match_strategy
57
- )
58
- end
59
-
60
- def profane?(phrase, lang: nil, strictness: :tolerant)
61
- return false if phrase == '' || phrase.nil?
62
-
63
- is_profane = pf_profane?(phrase, strictness: strictness)
64
- if !is_profane && use_webpurify?
65
- wp_is_profane = wp_profane?(phrase, lang: lang)
66
- is_profane = wp_is_profane unless wp_is_profane.nil?
67
- end
34
+ @available_strategies = {
35
+ ALLOW_SYMBOL_STRATEGY => ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
36
+ dictionary: exact_match_dictionary,
37
+ ignore_case: true
38
+ ),
39
+ DUPLICATE_CHARACTERS_STRATEGY => ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
40
+ dictionary: exact_match_dictionary,
41
+ ignore_case: true
42
+ ),
43
+ LEET_STRATEGY => ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
44
+ dictionary: exact_match_dictionary,
45
+ ignore_case: true
46
+ ),
47
+ PARTIAL_MATCH_STRATEGY => ::ProfanityFilterEngine::PartialMatchStrategy.new(
48
+ dictionary: partial_match_dictionary + exact_match_dictionary,
49
+ ignore_case: true
50
+ ),
51
+ }
52
+ end
68
53
 
69
- !!is_profane
54
+ def all_strategy_names
55
+ available_strategies.keys
70
56
  end
71
57
 
72
- def profanity_count(phrase, lang: nil, strictness: :tolerant)
73
- return 0 if phrase == '' || phrase.nil?
58
+ def basic_strategy_names
59
+ [ALLOW_SYMBOL_STRATEGY, PARTIAL_MATCH_STRATEGY]
60
+ end
61
+
62
+ def profane?(phrase, lang: nil, strategies: :basic)
63
+ return false if phrase == ''
64
+ return false if @whitelist.include?(phrase)
74
65
 
75
- banned_words_count = pf_profanity_count(phrase, strictness: strictness)
76
- if banned_words_count == 0 && use_webpurify?
77
- wp_banned_words_count = wp_profanity_count(phrase, lang: lang)
78
- banned_words_count = wp_banned_words_count unless wp_banned_words_count.nil?
66
+ if use_webpurify?
67
+ !!(pf_profane?(phrase, strategies: strategies) || wp_profane?(phrase, lang: lang))
68
+ else
69
+ !!pf_profane?(phrase, strategies: strategies)
79
70
  end
71
+ end
72
+
73
+ def profanity_count(phrase, lang: nil, strategies: :basic)
74
+ return 0 if phrase == '' || phrase.nil?
80
75
 
81
- banned_words_count
76
+ pf_count = pf_profanity_count(phrase, strategies: strategies)
77
+ if use_webpurify?
78
+ pf_count.zero? ? wp_profanity_count(phrase, lang: lang).to_i : pf_count
79
+ else
80
+ pf_count
81
+ end
82
82
  end
83
83
 
84
84
  private
@@ -87,23 +87,29 @@ class ProfanityFilter
87
87
  !!@wp_client
88
88
  end
89
89
 
90
- def filter(strictness: :tolerant)
91
- case strictness
92
- when :strict
93
- @strict_filter
94
- when :tolerant
95
- @tolerant_filter
96
- else
97
- @tolerant_filter
90
+ def filter(strategies:)
91
+ ::ProfanityFilterEngine::Composite.new.tap do |engine|
92
+ case strategies
93
+ when :all
94
+ all_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
95
+ when :basic
96
+ basic_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
97
+ else
98
+ strategies.each do |s|
99
+ raise "Strategy name \"#{s}\" not supported." unless all_strategy_names.include?(s)
100
+
101
+ engine.add_strategy(available_strategies[s])
102
+ end
103
+ end
98
104
  end
99
105
  end
100
106
 
101
- def pf_profane?(phrase, strictness: :tolerant)
102
- filter(strictness: strictness).profane?(phrase)
107
+ def pf_profane?(phrase, strategies:)
108
+ filter(strategies: strategies).profane?(phrase)
103
109
  end
104
110
 
105
- def pf_profanity_count(phrase, strictness: :tolerant)
106
- filter(strictness: strictness).profanity_count(phrase)
111
+ def pf_profanity_count(phrase, strategies:)
112
+ filter(strategies: strategies).profanity_count(phrase)
107
113
  end
108
114
 
109
115
  def wp_profane?(phrase, lang: nil, timeout_duration: 5)
@@ -120,7 +126,7 @@ class ProfanityFilter
120
126
  Timeout::timeout(timeout_duration) do
121
127
  @wp_client.check_count phrase, lang: wp_langs_list_with(lang)
122
128
  end
123
- rescue StandardError => e
129
+ rescue StandardError
124
130
  nil
125
131
  end
126
132
 
@@ -29,7 +29,7 @@ module ProfanityFilterEngine
29
29
 
30
30
  def profane_words(text)
31
31
  total_words = strategies.reduce([]) do |words, strategy|
32
- words.concat(strategy.profane_words(text))
32
+ words.concat(strategy.profane_words(text).map { |w| w.gsub(/[ _\-\.]/, '') })
33
33
  end
34
34
  total_words.uniq
35
35
  end
@@ -1,3 +1,3 @@
1
1
  class ProfanityFilter
2
- VERSION = '0.1.5'
2
+ VERSION = '1.0'
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: profanity-filter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
4
+ version: '1.0'
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maso Lin
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: exe
12
12
  cert_chain: []
13
- date: 2019-12-17 00:00:00.000000000 Z
13
+ date: 2019-12-31 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: webpurify