RubyGems - profanity-filter - Versions diffs - 0.1.5 → 1.0 - Mend

profanity-filter 0.1.5 → 1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +34 -0
data/Gemfile.lock +1 -1
data/README.md +51 -10
data/lib/profanity-dictionaries/en.yaml +0 -1
data/lib/profanity-filter.rb +73 -67
data/lib/profanity-filter/engines/composite.rb +1 -1
data/lib/profanity-filter/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d0c83f7c7ca1562230fddba3b7e38f46b8148ae207270995711a2c44ac6a7ad2
-  data.tar.gz: 0af221e57a13ed4f2e7da49f638b997167fd59396e700653b5d3d74a978479f9
+  metadata.gz: 9c99a863d299ca2b41dc51afa2f50ec34bc7eb5a30c4ab252881949dd3036d03
+  data.tar.gz: 5de32869e8f63201ee23c82390896f943876c4ac7298bf08b9efc640c451d1dc
 SHA512:
-  metadata.gz: a4587e6c90ffbccd1261d4f035f51825dbc7a667daa395bb6f705f0187a14268377c2f69faa08f264bd9f03ba485a31ed8fa678add0ce3dc7f45eb27a4056843
-  data.tar.gz: 3fea0ebb4adc5aeb60154a903f356c23bfffb800dd5ed410aeaaa9346a3babd4aa095c1a503fb4e2237e306ea00eb251069352b08d8c598488166093006db0f0
+  metadata.gz: 0ba717fee25e40a8dbba835cb54cf351da24207aa8992e75de0d80f5254ad8d4cd99d98b3b85ba1ad4205d83b072b00d0b23749418f26a954a36e52545a78572
+  data.tar.gz: 1b7d396648075e6bc5173009ef69413b8b07cd5bf319333562573f1ab98cafd225591274d4873b5c23df5767fe18128f3f615c571aa703a93630f69260b01228

data/CHANGELOG.md CHANGED

@@ -0,0 +1,34 @@
+## Version 1.0
+This version is not compatible with previous versions. The following are main changes and migration guide:
+1. Keyword parameter `strictness` for both `profane?` and `profanity_count` is replaced by `strategies`.
+    ```ruby
+    # 'strict mode' before
+    pf.profane?('text', strictness: :strict)
+    # 'strict mode' now
+    pf.profane?('text', strategies: :all)
+    # 'tolerant mode' before
+    pf.profane?('text', strictness: :tolerant)
+    # 'tolerant mode' now
+    pf.profane?('text', strategies: :basic)
+    ```
+2. We can compose our own strategies:
+    ```ruby
+    # the below two are exactly the same:
+    pf.profane?('text', strategies: [:leet, :allow_symbol, :duplicate_characters, :partial_match])
+    pf.profane?('text', strategies: :all)
+    ```
+3. Now the default mode has full support for partial match
+    ```ruby
+    # before it passes our filter, but now it's marked as profane.
+    pf.profane?('youasshole')
+    ```
+That's it. Enjoy!

data/Gemfile.lock CHANGED

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    profanity-filter (0.1.4)
+    profanity-filter (1.0)
       webpurify
 GEM

data/README.md CHANGED

@@ -12,7 +12,7 @@ This profanity filter implements:
 - [Full Support] diacritics, injections, unicode
 - [Partial Support] similarities, constructions
-This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage example below.
+This gem is also integrated with [WebPurify](https://www.webpurify.com). Usage example below.
 ## Installation
@@ -20,26 +20,27 @@ This gem is also integrated with [Web Purify](https://www.webpurify.com). Usage
 Add this line to your application's Gemfile:
 ```ruby
-gem 'profanity-filter'
+gem 'profanity-filter', '~> 1.0'
 ```
 And then execute:
-    $ bundle
+    $ bundle install
 Or install it yourself as:
     $ gem install profanity-filter
+## Versioning
+Version 1.0 onward is not compatible with previous versions. See [changelog(https://github.com/cardinalblue/profanity-filter/blob/master/CHANGELOG.md)] for details.
 ## Usage
+In your Ruby code,
 ```ruby
-# without WebPurify
+# basic usage
 pf = ProfanityFilter.new
-# with WebPurify
-pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
 pf.profane? ('ssssshit')
 # => true
@@ -47,6 +48,49 @@ pf.profanity_count('fjsdio fdsk fU_cK_THIS_shI_T')
 # => 2
 ```
+If we want to integrate WebPurify,
+```ruby
+# with WebPurify
+pf = ProfanityFilter.new(web_purifier_api_key: [YOUR-API-KEY])
+```
+With WebPurify enabled, texts sent to `profane?` and `profanity_count` will **first** be checked against the mechanism this gem provides, **then** against WebPurify if no positive results are returned.
+## Strategies
+There are four different `strategies` that we can compose to our heart's content.
+1. `:partial_match`
+will flag a text as profane if any substrings of it is in our dictionary.
+2. `:allow_symbol`
+will flag a text as profane if any word in the text matches our dictionary after removing the symbols.
+3. `:duplicate_characters`
+will flag a text as profane if any word in the text matches our dictionary after removing duplications.
+4. `:leet`
+will flag a text as profane if any word in the text matches our dictionary after substituting similar unicode characters with their letter correspondents.
+## Config
+By default, the profanity filter implements `:partial_match` and `:allow_symbol` strategies. But we can specify what strategies we want:
+```ruby
+pf = ProfanityFilter.new
+# type :basic is the default
+pf.profane?('test_string', strategies: :basic)
+pf.profanity_count('test_string', strategies: :basic)
+# type :all includes all four strategies
+pf.profane?('test_string', strategies: :all)
+pf.profanity_count('test_string', strategies: :all)
+# compose our own
+pf.profane?('test_string', strategies: [:partial_match, :leet])
+pf.profanity_count('test_string', strategies: [:partial_match, :leet])
+```
 ## Development
 After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
@@ -64,6 +108,3 @@ The gem is available as open source under the terms of the [MIT License](https:/
 ## Code of Conduct
 Everyone interacting in the ProfanityFilter project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/cardinalblue/profanity-filter/blob/master/CODE_OF_CONDUCT.md).
-## Todo
-pluggable logging and strategies

data/lib/profanity-dictionaries/en.yaml CHANGED

@@ -17,7 +17,6 @@
 - bitching
 - blowjob
 - blowjobs
-- bullshit
 - clit
 - cocksuck
 - cocksucked

data/lib/profanity-filter.rb CHANGED

@@ -9,76 +9,76 @@ require 'profanity-filter/engines/leet_exact_match_strategy'
 require 'web_purify'
 class ProfanityFilter
-  WP_DEFAULT_LANGS = [:en].freeze
-  WP_AVAILABLE_LANGS = [
+  WP_DEFAULT_LANGS    = [:en].freeze
+  WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
+  WP_AVAILABLE_LANGS  = [
     :en, :ar, :fr, :de, :hi, :jp, :it, :pt, :ru, :sp, :th, :tr, :zh, :kr, :pa
   ].freeze
-  WP_LANG_CONVERSIONS = { es: :sp, ko: :kr, ja: :jp }.freeze
-  attr_reader :strict_filter, :tolerant_filter
+  LEET_STRATEGY                 = :leet
+  ALLOW_SYMBOL_STRATEGY         = :allow_symbol
+  PARTIAL_MATCH_STRATEGY        = :partial_match
+  DUPLICATE_CHARACTERS_STRATEGY = :duplicate_characters
-  def initialize(web_purifier_api_key: nil)
+  attr_reader :available_strategies
+  def initialize(web_purifier_api_key: nil, whitelist: [])
     # If we are using Web Purifier
     @wp_client = web_purifier_api_key ? WebPurify::Client.new(web_purifier_api_key) : nil
+    @whitelist = whitelist
+    raise 'Whitelist should be an array' unless @whitelist.is_a?(Array)
     exact_match_dictionary = load_exact_match_dictionary
     partial_match_dictionary = load_partial_match_dictionary
-    allow_symbol_strategy = ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
-      dictionary: exact_match_dictionary,
-      ignore_case: true
-    )
-    duplicate_characters_strategy = ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
-      dictionary: exact_match_dictionary,
-      ignore_case: true
-    )
-    leet_strategy = ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
-      dictionary: exact_match_dictionary,
-      ignore_case: true
-    )
-    partial_match_strategy = ::ProfanityFilterEngine::PartialMatchStrategy.new(
-      dictionary: partial_match_dictionary,
-      ignore_case: true
-    )
-    # Set up strict filter.
-    @strict_filter = ::ProfanityFilterEngine::Composite.new
-    @strict_filter.add_strategies(
-      leet_strategy,
-      allow_symbol_strategy,
-      partial_match_strategy,
-      duplicate_characters_strategy
-    )
-    # Set up tolerant filter.
-    @tolerant_filter = ::ProfanityFilterEngine::Composite.new
-    @tolerant_filter.add_strategies(
-      allow_symbol_strategy,
-      partial_match_strategy
-    )
-  end
-  def profane?(phrase, lang: nil, strictness: :tolerant)
-    return false if phrase == '' || phrase.nil?
-    is_profane = pf_profane?(phrase, strictness: strictness)
-    if !is_profane && use_webpurify?
-      wp_is_profane = wp_profane?(phrase, lang: lang)
-      is_profane = wp_is_profane unless wp_is_profane.nil?
-    end
+    @available_strategies = {
+      ALLOW_SYMBOL_STRATEGY => ::ProfanityFilterEngine::AllowSymbolsInWordsStrategy.new(
+        dictionary:  exact_match_dictionary,
+        ignore_case: true
+      ),
+      DUPLICATE_CHARACTERS_STRATEGY => ::ProfanityFilterEngine::AllowDuplicateCharactersStrategy.new(
+        dictionary:  exact_match_dictionary,
+        ignore_case: true
+      ),
+      LEET_STRATEGY => ::ProfanityFilterEngine::LeetExactMatchStrategy.new(
+        dictionary:  exact_match_dictionary,
+        ignore_case: true
+      ),
+      PARTIAL_MATCH_STRATEGY => ::ProfanityFilterEngine::PartialMatchStrategy.new(
+        dictionary:  partial_match_dictionary + exact_match_dictionary,
+        ignore_case: true
+      ),
+    }
+  end
-    !!is_profane
+  def all_strategy_names
+    available_strategies.keys
   end
-  def profanity_count(phrase, lang: nil, strictness: :tolerant)
-    return 0 if phrase == '' || phrase.nil?
+  def basic_strategy_names
+    [ALLOW_SYMBOL_STRATEGY, PARTIAL_MATCH_STRATEGY]
+  end
+  def profane?(phrase, lang: nil, strategies: :basic)
+    return false if phrase == ''
+    return false if @whitelist.include?(phrase)
-    banned_words_count = pf_profanity_count(phrase, strictness: strictness)
-    if banned_words_count == 0 && use_webpurify?
-      wp_banned_words_count = wp_profanity_count(phrase, lang: lang)
-      banned_words_count = wp_banned_words_count unless wp_banned_words_count.nil?
+    if use_webpurify?
+      !!(pf_profane?(phrase, strategies: strategies) || wp_profane?(phrase, lang: lang))
+    else
+      !!pf_profane?(phrase, strategies: strategies)
     end
+  end
+  def profanity_count(phrase, lang: nil, strategies: :basic)
+    return 0 if phrase == '' || phrase.nil?
-    banned_words_count
+    pf_count = pf_profanity_count(phrase, strategies: strategies)
+    if use_webpurify?
+      pf_count.zero? ? wp_profanity_count(phrase, lang: lang).to_i : pf_count
+    else
+      pf_count
+    end
   end
   private
@@ -87,23 +87,29 @@ class ProfanityFilter
     !!@wp_client
   end
-  def filter(strictness: :tolerant)
-    case strictness
-    when :strict
-      @strict_filter
-    when :tolerant
-      @tolerant_filter
-    else
-      @tolerant_filter
+  def filter(strategies:)
+    ::ProfanityFilterEngine::Composite.new.tap do |engine|
+      case strategies
+      when :all
+        all_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
+      when :basic
+        basic_strategy_names.each { |s| engine.add_strategy(available_strategies[s]) }
+      else
+        strategies.each do |s|
+          raise "Strategy name \"#{s}\" not supported." unless all_strategy_names.include?(s)
+          engine.add_strategy(available_strategies[s])
+        end
+      end
     end
   end
-  def pf_profane?(phrase, strictness: :tolerant)
-    filter(strictness: strictness).profane?(phrase)
+  def pf_profane?(phrase, strategies:)
+    filter(strategies: strategies).profane?(phrase)
   end
-  def pf_profanity_count(phrase, strictness: :tolerant)
-    filter(strictness: strictness).profanity_count(phrase)
+  def pf_profanity_count(phrase, strategies:)
+    filter(strategies: strategies).profanity_count(phrase)
   end
   def wp_profane?(phrase, lang: nil, timeout_duration: 5)
@@ -120,7 +126,7 @@ class ProfanityFilter
     Timeout::timeout(timeout_duration) do
       @wp_client.check_count phrase, lang: wp_langs_list_with(lang)
     end
-  rescue StandardError => e
+  rescue StandardError
     nil
   end

data/lib/profanity-filter/engines/composite.rb CHANGED

@@ -29,7 +29,7 @@ module ProfanityFilterEngine
     def profane_words(text)
       total_words = strategies.reduce([]) do |words, strategy|
-        words.concat(strategy.profane_words(text))
+        words.concat(strategy.profane_words(text).map { |w| w.gsub(/[ _\-\.]/, '') })
       end
       total_words.uniq
     end

data/lib/profanity-filter/version.rb CHANGED

@@ -1,3 +1,3 @@
 class ProfanityFilter
-  VERSION = '0.1.5'
+  VERSION = '1.0'
 end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: profanity-filter
 version: !ruby/object:Gem::Version
-  version: 0.1.5
+  version: '1.0'
 platform: ruby
 authors:
 - Maso Lin
@@ -10,7 +10,7 @@ authors:
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2019-12-17 00:00:00.000000000 Z
+date: 2019-12-31 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: webpurify