RubyGems - words_counted - Versions diffs - 0.0.8 → 0.0.9 - Mend

words_counted 0.0.8 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/README.md +134 -102
data/lib/words_counted/counter.rb +24 -10
data/lib/words_counted/version.rb +1 -1
data/spec/words_counted/counter_spec.rb +46 -8
metadata +3 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: f755855668270d89fc16194ce006feb3f534bec3
-  data.tar.gz: 22bccb437e3105c3ba5d4f4bf563b5fa8757ac49
+  metadata.gz: f3043d1270f35c595ab088f6278749274daf60d3
+  data.tar.gz: 46e566b6a0fc584393ed066d4827cf7b18a4dc0a
 SHA512:
-  metadata.gz: 8e9222364a6ea859ed17a553b07558ff1fd9e8fb84b6d7c4775b023daf64210de1fc0b93511a6a21fb5afaada0897f8951809137b41641b44e52e29f1c0adeaa
-  data.tar.gz: 057299e3bd09f97b2dd75f815dec2b4a542a74e5c4512798301ce4cad8f9ba330b0fc1db5636143241f1e19142fce0b20e39f6aecd82c46ebaa43d61894b3e7c
+  metadata.gz: 05b360658c8cf57b1c117afde1a36623b5172a0ef2c53bfb39a8a4d1a7a3cf634896b5176ebf8f68aa0fb1e4afb84350a27805d5906a7524c84404240c54d097
+  data.tar.gz: 6bd402a6df332407b2b333d4c859980986bf61df46a9a7f5dac484c65fcddc49a6d60f0f0e54c51d0f70b1ecfc6c1b58a61d095887a766ad0dcf97e9a9876dc3

data/README.md CHANGED Viewed

@@ -1,22 +1,27 @@
 # Words Counted
-Words Counted is a Ruby word (or anything--see custom regexp) counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things.
+Words Counted is a highly customisable Ruby string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word density, words and their number of occurrences, the highest occurring words, and few more things.
-You can also pass in your custom criteria for splitting strings in the form of a custom regexp, which affords you a great deal of flexibility, whether you want to count words, numbers, or special characters.
+I use the word *word* loosely here, since you can pass the program any string you want: words, numbers, characters, etc...
+You can pass in your custom criteria for splitting strings in the form of a custom regular expression. This affords you a great deal of flexibility, whether you want to count words, numbers, or special characters.
 ### Features
-1. Count the number of words in a string.
-2. Get a hash map of words and the number of times they occur.
-3. Get a hash map of words and their lengthes.
-4. Get the most occurring word(s) and its number of occurrences.
-5. Get the longest word(s) and its length.
-6. Ability to filter out words from the count. Useful if you don't want to count `a`, `the`, etc...
-7. Filters special characters but respects hyphens and apostrophes.
-8. Plays nicely with diacritics (utf and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`
-9. Customisable criteria. Pass in your own regexp rules to split strings if you prefer.
-10. Get `char_count` and `average_chars_per_word`.
-11. Get unique word count.
+* Get the following data from any string:
+    * Word count
+    * Unique word count
+    * Word density
+    * Character count
+    * Average characters per word
+    * A hash map of words and the number of times they occur
+    * A hash map of words and their lengths
+    * The longest word(s) and its length
+    * The most occurring word(s) and its number of occurrences.
+* A flexible way to exclude words (or anything) from the count. You can pass in a **string**, a **regexp**, an **array**, or a **lambda**.
+* Filters special characters but respects hyphens and apostrophes.
+* Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`.
+* Customisable criteria. Pass in your own regexp rules to split strings if you prefer.
 See usage instructions for details on each feature.
@@ -48,7 +53,7 @@ counter = WordsCounted::Counter.new(
 #### `.word_count`
-Returns the word count of a given string. The word count includes only alpha characters. Hyphenated and words with apostrophes are considered a single word. You can pass in your own regexp if this is not desired behaviour.
+Returns the word count of a given string. The word count includes only alpha characters. Hyphenated and words with apostrophes are considered a single word. You can pass in your own regular expression if this is not desired behaviour.
 ```ruby
 counter.word_count #=> 15
@@ -60,23 +65,22 @@ Returns a hash map of words and their number of occurrences. Uppercase and lower
 ```ruby
 counter.word_occurrences
-#
-#  {
-#    "we" => 1,
-#    "are" => 2,
-#    "all" => 1,
-#    "in" => 1,
-#    "the" => 2,
-#    "gutter" => 1,
-#    "but" => 1,
-#    "some" => 1,
-#    "of" => 1,
-#    "us" => 1,
-#    "looking" => 1,
-#    "at" => 1,
-#    "stars" => 1
-#  }
-#
+{
+   "we" => 1,
+   "are" => 2,
+   "all" => 1,
+   "in" => 1,
+   "the" => 2,
+   "gutter" => 1,
+   "but" => 1,
+   "some" => 1,
+   "of" => 1,
+   "us" => 1,
+   "looking" => 1,
+   "at" => 1,
+   "stars" => 1
+ }
 ```
 #### `.most_occurring_words`
@@ -85,12 +89,8 @@ Returns a two dimensional array of the most occurring word and its number of occ
 ```ruby
 counter.most_occurring_words
-#
-#  [
-#    ["are", 2],
-#    ["the", 2]
-#  ]
-#
+[ ["are", 2], ["the", 2] ]
 ```
 #### `.word_lengths`
@@ -99,23 +99,22 @@ Returns a hash of words and their lengths.
 ```ruby
 counter.word_lengths
-#
-#  {
-#    "We" => 2,
-#    "are" => 3,
-#    "all" => 3,
-#    "in" => 2,
-#    "the" => 3,
-#    "gutter" => 6,
-#    "but" => 3,
-#    "some" => 4,
-#    "of" => 2,
-#    "us" => 2,
-#    "looking" => 7,
-#    "at" => 2,
-#    "stars" => 5
-#  }
-#
+{
+  "We" => 2,
+  "are" => 3,
+  "all" => 3,
+  "in" => 2,
+  "the" => 3,
+  "gutter" => 6,
+  "but" => 3,
+  "some" => 4,
+  "of" => 2,
+  "us" => 2,
+  "looking" => 7,
+  "at" => 2,
+  "stars" => 5
+}
 ```
 #### `.longest_word`
@@ -124,11 +123,8 @@ Returns a two dimensional array of the longest word and its length. In case ther
 ```ruby
 counter.longest_words
-#
-#  [
-#    ["looking", 7]
-#  ]
-#
+[ ["looking", 7] ]
 ```
 #### `.words`
@@ -146,23 +142,22 @@ Returns a two-dimentional array of words and their density.
 ```ruby
 counter.word_density
-#
-#  [
-#    ["are", 13.33],
-#    ["the", 13.33],
-#    ["but", 6.67],
-#    ["us", 6.67],
-#    ["of", 6.67],
-#    ["some", 6.67],
-#    ["looking", 6.67],
-#    ["gutter", 6.67],
-#    ["at", 6.67],
-#    ["in", 6.67],
-#    ["all", 6.67],
-#    ["stars", 6.67],
-#    ["we", 6.67]
-#  ]
-#
+[
+  ["are", 13.33],
+  ["the", 13.33],
+  ["but", 6.67],
+  ["us", 6.67],
+  ["of", 6.67],
+  ["some", 6.67],
+  ["looking", 6.67],
+  ["gutter", 6.67],
+  ["at", 6.67],
+  ["in", 6.67],
+  ["all", 6.67],
+  ["stars", 6.67],
+  ["we", 6.67]
+]
 ```
 #### `.char_count`
@@ -192,34 +187,63 @@ counter.unique_word_count
 #=> 13
 ```
-## Filtering
+## Excluding words from the analyser
+You can exclude anything you want from the string you want to analyse by passing in an `exclude` option. The exclude option accepts a variety of filters.
-You can pass in a *space-delimited* word list to filter words that you don't want to count. The filter will remove both uppercase and lowercase variants of the word.
+1. A *space-delimited* list of candidates. The filter will remove both uppercase and lowercase variants of the candidate, when applicable. Useful for excluding *the*, *a*, and so on.
+2. An array of string candidates. For example: `['a', 'the']`.
+3. A regular expression.
+4. A lambda.
+#### Using a string
 ```ruby
 WordsCounted::Counter.new(
-  "Magnificent! That was magnificent, Trevor.", filter: "was magnificent"
+  "Magnificent! That was magnificent, Trevor.", exclude: "was magnificent"
 )
 counter.words
 #=> ["That", "Trevor"]
 ```
+#### Using an array
+```ruby
+WordsCounted::Counter.new("1 2 3 4 5 6", regexp: /[0-9]/, exclude: ['1', '2', '3'])
+counter.words
+#=> ["4", "5", "6"]
+```
+#### Using a regular expression
+```ruby
+WordsCounted::Counter.new("Hello Beirut", exclude: /Beirut/)
+counter.words
+#=> ["Hello"]
+```
+#### Using a lambda
+```ruby
+WordsCounted::Counter.new(
+  "1 2 3 4 5 6", regexp: /[0-9]/, exclude: ->(w) { w.to_i.even? }
+)
+counter.words
+#=> ["1", "3", "5"]
+```
 ## Passing in a Custom Regexp
-Defining words is tricky business. Out of the box, the default regexp accounts for letters, hyphenated words, and apostrophes. This means `twenty-one` is treated as one word. So is `Mohamad's`.
+Defining words is tricky business. Out of the box, the default regexp accounts for letters, hyphenated words, and apostrophes. This means *twenty-one* is treated as one word. So is *Mohamad's*.
 ```ruby
 /[\p{Alpha}\-']+/
 ```
-But maybe you don't want to count words? Well, count anything you want. What you count is only limited by your knowledge of regular expressions. Pass in your own criteria in the form of a Ruby regexp to split your string as desired.
+But maybe you don't want to count words? Well, count anything you want. What you count is only limited by your knowledge of regular expressions. Pass in your own criteria in the form of a Ruby regular expression to split your string as desired.
-For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
+For example, if you wanted to include numbers in your analysis, you can override the regular expression:
 ```ruby
-counter = WordsCounted::Counter.new("I am 007.", regex: /[\p{Alnum}\-']+/)
+counter = WordsCounted::Counter.new("Numbers 1, 2, and 3", regexp: /[\p{Alnum}\-']+/)
 counter.words
-#=> ["I", "am", "007"]
+#=> ["Numbers", "1", "2", "and", "3"]
 ```
 ## Gotchas
@@ -228,34 +252,31 @@ A hyphen used in leu of an *em* or *en* dash will form part of the word and thro
 ```ruby
 counter = WordsCounted::Counter.new("How do you do?-you are well, I see.")
-#<WordsCounted::Counter:0x007fd494252518 @words=["How", "do", "you", "do", "-you", "are", "well", "I", "see"]>
 counter.word_occurrences
-#
-#  {
-#    "how" => 1,
-#    "do" => 2,
-#    "you" => 1,
-#    "-you" => 1, # WTF, mate!
-#    "are" => 1,
-#    "very" => 1,
-#    "well" => 1,
-#    "i" => 1,
-#    "see" => 1
-#  }
-#
+{
+  "how" => 1,
+  "do" => 2,
+  "you" => 1,
+  "-you" => 1, # WTF, mate!
+  "are" => 1,
+  "very" => 1,
+  "well" => 1,
+  "i" => 1,
+  "see" => 1
+}
 ```
 In this example, `-you` and `you` are counted as separate words. Writers should use the correct dash element, but this is not always the case.
-Another gotcha is that the default criteria does not count numbers as words. Remember that you can pass in your own regexp if the default solution does not fit your needs.
+Another gotcha is that the default criteria does not include numbers in its analysis. Remember that you can pass in your own regular expression if the default behaviour does not fit your needs.
 ## Road Map
 1. Add ability to open files or URLs.
 2. Add paragraph, sentence, average words per sentence, and average sentence chars counters.
-#### Ability to open files or urls
+#### Ability to open files and URLs
 Maybe I can some class methods to open the file and init the counter class.
@@ -277,7 +298,13 @@ end
 ## About
-Originally I wrote this program for a code challenge. My initial implementation was decent, but it could have been better. Thanks to [Dave Yarwood](http://codereview.stackexchange.com/a/47515/1563) for helping me improve my code. Some of this code is based on his recommendations. You can find the original implementation as well as the code review on [Code Review](http://codereview.stackexchange.com/questions/46105/a-ruby-string-analyser).
+Originally I wrote this program for a code challenge on Treehouse. You can find the original implementation on [Code Review][1].
+## Contributers
+Thanks to Dave Yarwood for helping me improve my code. Some of my code is based on his recommendations. You can find the original program implementation, as well as Dave's code review, on [Code Review][1].
+Thanks to [Wayne Conrad][2] for providing [an excellent code review][3], and improving the filter feature well beyond what I can come up with.
 ## Contributing
@@ -286,3 +313,8 @@ Originally I wrote this program for a code challenge. My initial implementation
 3. Commit your changes (`git commit -am 'Add some feature'`)
 4. Push to the branch (`git push origin my-new-feature`)
 5. Create new Pull Request
+  [1]: http://codereview.stackexchange.com/questions/46105/a-ruby-string-analyser
+  [2]: https://github.com/wconrad
+  [3]: http://codereview.stackexchange.com/a/47515/1563

data/lib/words_counted/counter.rb CHANGED Viewed

@@ -2,12 +2,13 @@ module WordsCounted
   class Counter
     attr_reader :words, :word_occurrences, :word_lengths, :char_count
-    WORD_REGEX = /[\p{Alpha}\-']+/
+    WORD_REGEXP = /[\p{Alpha}\-']+/
     def initialize(string, options = {})
       @options = options
-      @char_count = string.length
-      @words = string.scan(regex).reject { |word| filter.include? word.downcase }
+      exclude = filter_proc(options[:exclude])
+      @words = string.scan(regexp).reject { |word| exclude.call(word) }
+      @char_count = @words.join.size
       @word_occurrences = words.each_with_object(Hash.new(0)) do |word, hash|
         hash[word.downcase] += 1
       end
@@ -40,7 +41,7 @@ module WordsCounted
       end.sort_by { |_, value| value }.reverse
     end
-    private
+  private
     def highest_ranking(entries)
       entries.group_by { |word, value| value }.sort.last.last
@@ -50,15 +51,28 @@ module WordsCounted
       (n.to_f / word_count.to_f * 100.0).round(2)
     end
-    def regex
-      @options[:regex] || WORD_REGEX
+    def regexp
+      @options[:regexp] || WORD_REGEXP
     end
-    def filter
-      if filters = @options[:filter]
-        filters.split.collect { |word| word.downcase }
+    def filter_proc(filter)
+      if filter.respond_to?(:to_a)
+        filter_procs = Array(filter).map(&method(:filter_proc))
+        ->(word) {
+          filter_procs.any? { |p| p.call(word) }
+        }
+      elsif filter.respond_to?(:to_str)
+        exclusion_list = filter.split.collect(&:downcase)
+        ->(w) {
+          exclusion_list.include?(w.downcase)
+        }
+      elsif Regexp.try_convert(filter)
+        filter = Regexp.try_convert(filter)
+        Proc.new { |w| w =~ filter }
+      elsif filter.respond_to?(:to_proc)
+        filter.to_proc
       else
-        []
+        raise ArgumentError, "Incorrect filter type"
       end
     end
   end

data/lib/words_counted/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module WordsCounted
-  VERSION = "0.0.8"
+  VERSION = "0.0.9"
 end

data/spec/words_counted/counter_spec.rb CHANGED Viewed

@@ -1,4 +1,5 @@
-require "spec_helper"
+# -*- coding: utf-8 -*-
+require_relative "../spec_helper"
 module WordsCounted
   describe Counter do
@@ -55,20 +56,45 @@ module WordsCounted
         expect(counter.words).to eq(%w[São Paulo])
       end
-      it "filters words" do
-        counter = Counter.new("That was magnificent, Trevor.", filter: "magnificent")
+      it "it accepts a string filter" do
+        counter = Counter.new("That was magnificent, Trevor.", exclude: "magnificent")
         expect(counter.words).to eq(%w[That was Trevor])
       end
-      it "filters words when passed in in uppercase" do
-        counter = Counter.new("That was magnificent, Trevor.", filter: "Magnificent")
+      it "it accepts a string filter with multiple words" do
+        counter = Counter.new("That was magnificent, Trevor.", exclude: "was magnificent")
+        expect(counter.words).to eq(%w[That Trevor])
+      end
+      it "filters words in uppercase when using a string filter" do
+        counter = Counter.new("That was magnificent, Trevor.", exclude: "Magnificent")
+        expect(counter.words).to eq(%w[That was Trevor])
+      end
+      it "accepts a regexp filter" do
+        counter = Counter.new("That was magnificent, Trevor.", exclude: /magnificent/i)
         expect(counter.words).to eq(%w[That was Trevor])
       end
-      it "splits words based on regex" do
-        counter = Counter.new("I am 007.", regex: /[\p{Alnum}\-']+/)
+      it "accepts an array filter" do
+        counter = Counter.new("That was magnificent, Trevor.", exclude: ['That', 'was'])
+        expect(counter.words).to eq(%w[magnificent Trevor])
+      end
+      it "accepts a lambda filter" do
+        counter = Counter.new("That was magnificent, Trevor.", exclude: ->(w) {w == 'That'})
+        expect(counter.words).to eq(%w[was magnificent Trevor])
+      end
+      it "accepts a custom regexp" do
+        counter = Counter.new("I am 007.", regexp: /[\p{Alnum}\-']+/)
         expect(counter.words).to eq(["I", "am", "007"])
       end
+      it "char_count should be calculated after the filter is applied" do
+        counter = Counter.new("I am Legend.", exclude: "I am")
+        expect(counter.char_count).to eq(6)
+      end
     end
     describe ".word_count" do
@@ -134,14 +160,26 @@ module WordsCounted
     describe ".char_count" do
       it "returns the number of chars in the passed in string" do
-        expect(counter.char_count).to eq(66)
+        counter = Counter.new("His name was major, Major Major Major Major.")
+        expect(counter.char_count).to eq(35)
+      end
+      it "returns the number of chars in the passed in string after the filter is applied" do
+        counter = Counter.new("His name was major, Major Major Major Major.", exclude: "Major")
+        expect(counter.char_count).to eq(10)
       end
     end
     describe ".average_chars_per_word" do
       it "returns the average number of chars per word" do
+        counter = Counter.new("His name was major, Major Major Major Major.")
         expect(counter.average_chars_per_word).to eq(4)
       end
+      it "returns the average number of chars per word after the filter is applied" do
+        counter = Counter.new("His name was major, Major Major Major Major.", exclude: "Major")
+        expect(counter.average_chars_per_word).to eq(3)
+      end
     end
     describe ".unique_word_count" do

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: words_counted
 version: !ruby/object:Gem::Version
-  version: 0.0.8
+  version: 0.0.9
 platform: ruby
 authors:
 - Mohamad El-Husseini
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-03 00:00:00.000000000 Z
+date: 2014-10-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -106,7 +106,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.0.6
+rubygems_version: 2.2.2
 signing_key:
 specification_version: 4
 summary: See README.