RubyGems - words_counted - Versions diffs - 0.0.4 → 0.0.5 - Mend

words_counted 0.0.4 → 0.0.5

Files changed (7) hide show

checksums.yaml +4 -4
data/.yardopts +2 -1
data/README.md +78 -3
data/lib/words_counted/counter.rb +86 -26
data/lib/words_counted/version.rb +1 -1
data/spec/words_counted/counter_spec.rb +38 -15
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: a74d7a0ee0210b034e5babbe9f90f11cd0c5db8b
-  data.tar.gz: f1852e2f1f728b7a22c519d43e670078edc61509
+  metadata.gz: 9847725d713bc20dd1b66d86321c8089e33276bf
+  data.tar.gz: f847a70bf6e008527606f944833979bf952a330e
 SHA512:
-  metadata.gz: b8edcbac512ac3edaf9dfc1a28a66e55e34e7ca33c9db57688c8c5658c0f5b514c1585316d7657389ce3ed40c1b91c0dd58aefca24c2c0b1d7ab91736cb30f80
-  data.tar.gz: 5019a1db7d42ef06068e24e666efe90f54bc7ac3821bbb9b2232506644712fa985592dfcdcf6a3310945ec940002a6b037665685aebcb645052262accb2ccd86
+  metadata.gz: d5922f2b471ea4bc60650a77972c289e516dae19b5ba4e59ceacf669df85442b775361a43ea9c8765270b005e2a8dfd1db39dbd4b85e29db142a3c064cabaff7
+  data.tar.gz: 2ab1369a5dc2b063c749242339e93d7a5f5424a9bae489fd6b649406834e7e05f18af7a598c797928f256d73eba532db3e1ebde95cbf5798b58b9e32599a6f8f

data/.yardopts CHANGED Viewed

@@ -1,2 +1,3 @@
 --title 'Word Counter for Ruby'
---private
+--private
+--markup markdown

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Words Counted
-This Ruby gem is a word counter that includes some handy utility methods. It lets you send in a string of text and count the number of words, get the words sorted by number occurrences, get the highest occurring words, and few more things.
+Words Counted is a Ruby word counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things. You can also pass in your custom criteria for splitting strings in the form of a custom regexp.
 ### Features
@@ -11,6 +11,8 @@ This Ruby gem is a word counter that includes some handy utility methods. It let
 5. Get the longest word(s) and its length.
 6. Ability to filter out words from the count. Useful if you don't want to count `a`, `the`, etc...
 7. Filters special characters but respects hyphens and apostrophes.
+8. Plays nicely with diacritics (utf and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`
+9. Customisable criteria. Pass in your own regexp rules to split strings if you prefer.
 See usage instructions for details on each feature.
@@ -132,13 +134,55 @@ counter.words
 #=> ["We", "are", "all", "in", "the", "gutter", "but", "some", "of", "us", "are", "looking", "at", "the", "stars"]
 ```
+#### `.word_density`
+Returns a two-dimentional array of words and their density.
+```ruby
+counter.word_density
+#
+#  [
+#    ["are", 13.33],
+#    ["the", 13.33],
+#    ["but", 6.67],
+#    ["us", 6.67],
+#    ["of", 6.67],
+#    ["some", 6.67],
+#    ["looking", 6.67],
+#    ["gutter", 6.67],
+#    ["at", 6.67],
+#    ["in", 6.67],
+#    ["all", 6.67],
+#    ["stars", 6.67],
+#    ["we", 6.67]
+#  ]
+#
+```
 ## Filtering
 You can pass in a space-delimited word list to filter words that you don't want to count. Filter words should be *lowercase*. The filter will remove both uppercase and lowercase variants of the word.
 ```ruby
-WordsCounted::Counter.new("Magnificent! That was magnificent, Trevor.", "was magnificent")
-#<WordsCounted::Counter:0x007fd4949f99d8 @words=["That", "Trevor"]>
+WordsCounted::Counter.new("Magnificent! That was magnificent, Trevor.", filter: "was magnificent")
+counter.words
+#=> ["That", "Trevor"]
+```
+## Passing in a Custom Regexp
+Defining words is tricky business. Out of the box, the default regexp accounts for letters, hyphenated words, and apostrophes. This means `twenty-one` is treated as one word. So is `Mohamad's`.
+```ruby
+/[^\p{Alpha}\-']+/
+```
+If you prefer, you can pass in your own criteria in the form of a Ruby regexp to split your string as desired. For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
+```ruby
+counter = WordsCounted::Counter.new("I am 007.", regex: /[^\p{Alnum}\-']+/)
+counter.words
+ => ["I", "am", "007"]
 ```
 ## Gotchas
@@ -167,6 +211,37 @@ counter.word_occurrences
 In this example, `-you` and `you` are counted as separate words. Writers should use the correct dash element, but this is not always the case.
+The default criteria does not count numbers as words.
+## To do
+1. Add paragraph counter.
+2. Add ability to open files or URLs.
+3. A character counter, with spaces, and without spaces.
+4. A sentence counter.
+5. Average words in a sentence.
+6. Average sentence chars.
+#### Ability to open files or urls
+Maybe I can some class methods to open the file and init the counter class.
+```ruby
+def self.count_from_url
+  new # open url and send string here after removing html
+end
+def self.from_file
+  new # open file and send string here.
+end
+```
+## But wait... wait a minute...
+#### Isn't it better to write this in JavaScript?
+![http://stream1.gifsoup.com/view3/1290449/picard-facepalm-o.gif][Picard face palm]
 ## About
 Originally I wrote this program for a code challenge. My initial implementation was decent, but it could have been better. Thanks to [Dave Yarwood](http://codereview.stackexchange.com/a/47515/1563) for helping me improve my code. Some of this code is based on his recommendations. You can find the original implementation as well as the code review on [Code Review](http://codereview.stackexchange.com/questions/46105/a-ruby-string-analyser).

data/lib/words_counted/counter.rb CHANGED Viewed

@@ -1,46 +1,78 @@
 module WordsCounted
+  # Represents a Counter object.
+  #
   class Counter
     # @!words [Array] an array of words resulting from the string passed to the initializer.
-    attr_reader :words
+    # @!word_occurrences [Hash] an hash of words as keys and their occurrences as values.
+    # @!word_lengths [Hash] an hash of words as keys and their lengths as values.
+    attr_reader :words, :word_occurrences, :word_lengths
     # This is the criteria for defining words.
     #
     # Words are alpha characters and can include hyphens and apostrophes.
+    #
     WORD_REGEX = /[^\p{Alpha}\-']+/
     # Initializes an instance of Counter and splits a given string into an array of words.
     #
-    #   Counter.new("Bad, bad, piggy!")
-    #   => #<WordsCounted::Counter:0x007fd49429bfb0 @words=["Bad", "bad", "piggy"]>
+    # ## @words
+    # This is the array of words that results from the string passed in. For example:
+    #
+    #    Counter.new("Bad, bad, piggy!")
+    #    => #<WordsCounted::Counter:0x007fd49429bfb0 @words=["Bad", "bad", "piggy"]>
     #
     # @param string [String] the string to act on.
-    # @param filter [String] a string of words to filter from the string to act on.
+    # @param options [Hash] a hash of options that includes `filter` and `regex`
     #
-    def initialize(string, filter = "")
-      @words = string.split(WORD_REGEX).reject { |word| filter.split.include? word.downcase }
-    end
-    # Returns the total word count.
+    #   ## `filter`
+    #   This a list of words to filter from the string. Useful if you want to remove *a*, **you**, and other common words.
+    #   Any words included in the filter must be **lowercase**.
+    #   defaults to an empty string
+    #
+    #   ## `regex`
+    #   The criteria used to split a string. It defaults to `/[^\p{Alpha}\-']+/`.
     #
-    def word_count
-      words.size
-    end
-    # Returns a hash of words and their occurrences.
-    # Occurrences count is not case sensitive:
     #
-    # `"Hello hello" #=> { "hello" => 2 }`
+    # @word_occurrences
+    # This is a hash of words and their occurrences. Occurrences count is not case sensitive.
     #
-    # @return [Hash] the resulting hash of words (keys) and their occurrences (values).
+    # ## Example
     #
-    def word_occurrences
-      @occurrences ||= words.each_with_object(Hash.new(0)) { |word, result| result[word.downcase] += 1 }
+    #    "Hello hello" #=> { "hello" => 2 }
+    #
+    # @return [Hash] a hash map of words as keys and their occurrences as values.
+    #
+    #
+    # ## @word_lengths
+    # This is a hash of words and their lengths.
+    #
+    # ## Example
+    #
+    #    "Hello sir" #=> { "hello" => 5, "sir" => 3 }
+    #
+    # @return [Hash] a hash map of words as keys and their lengths as values.
+    #
+    def initialize(string, options = {})
+      @options = options
+      @words = string.split(regex).reject { |word| filter.split.include? word.downcase }
+      @word_occurrences = words.each_with_object(Hash.new(0)) do |word, result|
+        result[word.downcase] += 1
+      end
+      @word_lengths = words.each_with_object({}) do |word, result|
+        result[word] ||= word.length
+      end
     end
-    # Returns a hash of words and their lengths.
+    # Returns the total word count.
     #
-    def word_lengths
-      @lengths ||= words.each_with_object({}) { |word, result| result[word] ||= word.length }
+    # @return [Integer] total word count from `words` array size.
+    #
+    def word_count
+      words.size
     end
     # Returns a  two dimensional array of the most occuring word(s)
@@ -48,30 +80,58 @@ module WordsCounted
     #
     # In the event of a tie, all tied words are returned.
     #
+    # @return [Array] see {#highest_ranking}
+    #
     def most_occurring_words
       highest_ranking word_occurrences
     end
     # Returns a  two dimensional array of the longest word(s) and
-    # its length.
+    # its length. In the event of a tie, all tied words are returned.
     #
-    # In the event of a tie, all tied words are returned.
+    # @return [Array] see {#highest_ranking}
     #
     def longest_words
       highest_ranking word_lengths
     end
+    # Returns a hash of word and their word density in percent.
+    #
+    # @returns [Hash] a hash map of words as keys and their density as values in percent.
+    #
+    def word_density
+      word_occurrences.each_with_object({}) { |(word, occ), hash| hash[word] = percent_of_n(occ) }.sort_by { |_, v| v }.reverse
+    end
     private
     # Takes a hashmap of the form {"foo" => 1, "bar" => 2} and returns an array
     # containing the entries (as an array) with the highest number as a value.
     #
-    # {http://codereview.stackexchange.com/a/47515/1563 See here}.
-    #
     # @param entries [Hash] a hash of entries to analyse
+    # @return [Array] a two dimentional array where each consists of a word its rank
+    #
+    # {http://codereview.stackexchange.com/a/47515/1563 See here}.
     #
     def highest_ranking(entries)
       entries.group_by { |word, occurrence| occurrence }.sort.last.last
     end
+    # Calculates the percentege of a word.
+    #
+    # @param n [Integer] the divisor.
+    # @returns [Float] a percentege of n based on {#word_count} rounded to two decimal places.
+    #
+    def percent_of_n(n)
+      ((n.to_f / word_count.to_f) * 100.0).round(2)
+    end
+    def regex
+      @options[:regex] || WORD_REGEX
+    end
+    def filter
+      @options[:filter] || String.new
+    end
   end
 end

data/lib/words_counted/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module WordsCounted
-  VERSION = "0.0.4"
+  VERSION = "0.0.5"
 end

data/spec/words_counted/counter_spec.rb CHANGED Viewed

@@ -2,10 +2,23 @@ require "spec_helper"
 module WordsCounted
   describe Counter do
+    let(:counter) { Counter.new("We are all in the gutter, but some of us are looking at the stars.") }
-    describe ".words" do
-      let(:counter) { Counter.new("We are all in the gutter, but some of us are looking at the stars.") }
+    describe "#initialize" do
+      it "sets @words" do
+        expect(counter.instance_variables).to include(:@words)
+      end
+      it "sets @word_occurrences" do
+        expect(counter.instance_variables).to include(:@word_occurrences)
+      end
+      it "sets @word_lengths" do
+        expect(counter.instance_variables).to include(:@word_lengths)
+      end
+    end
+    describe ".words" do
       it "returns an array" do
         expect(counter.words).to be_a(Array)
       end
@@ -30,65 +43,75 @@ module WordsCounted
       end
       it "filters words" do
-        counter = Counter.new("That was magnificent, Trevor.", "magnificent")
+        counter = Counter.new("That was magnificent, Trevor.", filter: "magnificent")
         expect(counter.words).to eq(%w[That was Trevor])
       end
+      it "splits words based on regex" do
+        counter = Counter.new("I am 007.", regex: /[^\p{Alnum}\-']+/)
+        expect(counter.words).to eq(["I", "am", "007"])
+      end
     end
     describe ".word_count" do
-      let(:counter) { Counter.new("In that case I'll take measures to secure you, woman!") }
       it "returns the correct word count" do
-        expect(counter.word_count).to eq(10)
+        expect(counter.word_count).to eq(15)
       end
     end
     describe ".word_occurrences" do
-      let(:counter) { Counter.new("Bad, bad, piggy!") }
       it "returns a hash" do
         expect(counter.word_occurrences).to be_a(Hash)
       end
       it "treats capitalized words as the same word" do
+        counter = Counter.new("Bad, bad, piggy!")
         expect(counter.word_occurrences).to eq({ "bad" => 2, "piggy" => 1 })
       end
     end
     describe ".most_occurring_words" do
-      let(:counter) { Counter.new("One should always be in love. That is the reason one should never marry.") }
       it "returns an array" do
         expect(counter.most_occurring_words).to be_a(Array)
       end
       it "returns highest occuring words" do
-        expect(counter.most_occurring_words).to eq([["one", 2],["should", 2]])
+        counter = Counter.new("Orange orange Apple apple banana")
+        expect(counter.most_occurring_words).to eq([["orange", 2],["apple", 2]])
       end
     end
     describe '.word_lengths' do
-      let(:counter) { Counter.new("One two three.") }
       it "returns a hash" do
         expect(counter.word_lengths).to be_a(Hash)
       end
       it "returns a hash of word lengths" do
+        counter = Counter.new("One two three.")
         expect(counter.word_lengths).to eq({ "One" => 3, "two" => 3, "three" => 5 })
       end
     end
     describe ".longest_words" do
-      let(:counter) { Counter.new("Those whom the gods love grow young.") }
       it "returns an array" do
         expect(counter.longest_words).to be_a(Array)
       end
       it "returns the longest words" do
+        counter = Counter.new("Those whom the gods love grow young.")
         expect(counter.longest_words).to eq([["Those", 5],["young", 5]])
       end
     end
+    describe ".word_density" do
+      it "returns a hash" do
+        expect(counter.word_density).to be_a(Array)
+      end
+      it "returns words and their density in percent" do
+        counter = Counter.new("His name was major, I mean, Major Major Major Major.")
+        expect(counter.word_density).to eq([["major", 50.0], ["mean", 10.0], ["i", 10.0], ["was", 10.0], ["name", 10.0], ["his", 10.0]])
+      end
+    end
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: words_counted
 version: !ruby/object:Gem::Version
-  version: 0.0.4
+  version: 0.0.5
 platform: ruby
 authors:
 - Mohamad El-Husseini
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-04-30 00:00:00.000000000 Z
+date: 2014-05-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler