RubyGems - words_counted - Versions diffs - 0.0.7 → 0.0.8 - Mend

words_counted 0.0.7 → 0.0.8

Files changed (6) hide show

checksums.yaml +4 -4
data/README.md +45 -16
data/lib/words_counted/counter.rb +24 -98
data/lib/words_counted/version.rb +1 -1
data/spec/words_counted/counter_spec.rb +32 -0
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: b9b07e20d2f4adda71cca50cab03b13b8ed7655e
-  data.tar.gz: f960b67f29488004565aaea9ca2fae20d01b17fe
+  metadata.gz: f755855668270d89fc16194ce006feb3f534bec3
+  data.tar.gz: 22bccb437e3105c3ba5d4f4bf563b5fa8757ac49
 SHA512:
-  metadata.gz: 3a5b7d91f4d8d90956d82b9de7f0c96fd27b1db3fbc284e4de3506c8b9a250c91c34f5156ea6bcfde21a58cb044795d560ffc9f10d80f8ad96a0df5487bc1696
-  data.tar.gz: 99403fb14b085a7b2fa4b3db80e3510d4d8ee9a6e63a8b398af1b1c0243f6c04f1ba27000297b7d3c10d9904736569240b9b935c04385177bc93f072ddb336eb
+  metadata.gz: 8e9222364a6ea859ed17a553b07558ff1fd9e8fb84b6d7c4775b023daf64210de1fc0b93511a6a21fb5afaada0897f8951809137b41641b44e52e29f1c0adeaa
+  data.tar.gz: 057299e3bd09f97b2dd75f815dec2b4a542a74e5c4512798301ce4cad8f9ba330b0fc1db5636143241f1e19142fce0b20e39f6aecd82c46ebaa43d61894b3e7c

data/README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 # Words Counted
-Words Counted is a Ruby word counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things. You can also pass in your custom criteria for splitting strings in the form of a custom regexp.
+Words Counted is a Ruby word (or anything--see custom regexp) counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things.
+You can also pass in your custom criteria for splitting strings in the form of a custom regexp, which affords you a great deal of flexibility, whether you want to count words, numbers, or special characters.
 ### Features
@@ -13,6 +15,8 @@ Words Counted is a Ruby word counter and string analyser. It includes some handy
 7. Filters special characters but respects hyphens and apostrophes.
 8. Plays nicely with diacritics (utf and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`
 9. Customisable criteria. Pass in your own regexp rules to split strings if you prefer.
+10. Get `char_count` and `average_chars_per_word`.
+11. Get unique word count.
 See usage instructions for details on each feature.
@@ -32,7 +36,7 @@ Or install it yourself as:
 ## Usage
-Create an instance of `Counter` and pass in a string and an optional filter string.
+Create an instance of `Counter` and pass in a string and an optional filter and/or regexp.
 ```ruby
 counter = WordsCounted::Counter.new(
@@ -40,9 +44,11 @@ counter = WordsCounted::Counter.new(
 )
 ```
+### API
 #### `.word_count`
-Returns the word count of a given string. The word count includes only alpha characters. Hyphenated and words with apostrophes are considered a single word.
+Returns the word count of a given string. The word count includes only alpha characters. Hyphenated and words with apostrophes are considered a single word. You can pass in your own regexp if this is not desired behaviour.
 ```ruby
 counter.word_count #=> 15
@@ -159,9 +165,36 @@ counter.word_density
 #
 ```
+#### `.char_count`
+Returns the string's character count.
+```ruby
+counter.char_count
+#=> 76
+```
+#### `.average_chars_per_word`
+Returns the average character count per word.
+```ruby
+counter.average_chars_per_word
+#=> 4
+```
+#### `.unique_word_count`
+Returns the count of unique words in the string.
+```ruby
+counter.unique_word_count
+#=> 13
+```
 ## Filtering
-You can pass in a space-delimited word list to filter words that you don't want to count. Filter words should be *lowercase*. The filter will remove both uppercase and lowercase variants of the word.
+You can pass in a *space-delimited* word list to filter words that you don't want to count. The filter will remove both uppercase and lowercase variants of the word.
 ```ruby
 WordsCounted::Counter.new(
@@ -179,7 +212,9 @@ Defining words is tricky business. Out of the box, the default regexp accounts f
 /[\p{Alpha}\-']+/
 ```
-If you prefer, you can pass in your own criteria in the form of a Ruby regexp to split your string as desired. For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
+But maybe you don't want to count words? Well, count anything you want. What you count is only limited by your knowledge of regular expressions. Pass in your own criteria in the form of a Ruby regexp to split your string as desired.
+For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
 ```ruby
 counter = WordsCounted::Counter.new("I am 007.", regex: /[\p{Alnum}\-']+/)
@@ -189,7 +224,7 @@ counter.words
 ## Gotchas
-A hyphen use in leu of an *em* or *en* dash will form part of the word and throw off the `word_occurences` algorithm.
+A hyphen used in leu of an *em* or *en* dash will form part of the word and throw off the `word_occurences` algorithm.
 ```ruby
 counter = WordsCounted::Counter.new("How do you do?-you are well, I see.")
@@ -213,18 +248,12 @@ counter.word_occurrences
 In this example, `-you` and `you` are counted as separate words. Writers should use the correct dash element, but this is not always the case.
-Another gotcha is that the default criteria does not count numbers as words.
-Remember that you can pass in your own regexp if the default solution does not fit your needs.
+Another gotcha is that the default criteria does not count numbers as words. Remember that you can pass in your own regexp if the default solution does not fit your needs.
-## To do
+## Road Map
-1. Add paragraph counter.
-2. Add ability to open files or URLs.
-3. A character counter, with spaces, and without spaces.
-4. A sentence counter.
-5. Average words in a sentence.
-6. Average sentence chars.
+1. Add ability to open files or URLs.
+2. Add paragraph, sentence, average words per sentence, and average sentence chars counters.
 #### Ability to open files or urls

data/lib/words_counted/counter.rb CHANGED Viewed

@@ -1,131 +1,53 @@
 module WordsCounted
-  # Represents a Counter object.
-  #
   class Counter
-    # @!words [Array] an array of words resulting from the string passed to the initializer.
-    # @!word_occurrences [Hash] an hash of words as keys and their occurrences as values.
-    # @!word_lengths [Hash] an hash of words as keys and their lengths as values.
-    attr_reader :words, :word_occurrences, :word_lengths
+    attr_reader :words, :word_occurrences, :word_lengths, :char_count
-    # This is the criteria for defining words.
-    #
-    # Words are alpha characters and can include hyphens and apostrophes.
-    #
     WORD_REGEX = /[\p{Alpha}\-']+/
-    # Initializes an instance of Counter and splits a given string into an array of words.
-    #
-    # ## @words
-    # This is the array of words that results from the string passed in. For example:
-    #
-    #    Counter.new("Bad, bad, piggy!")
-    #    => #<WordsCounted::Counter:0x007fd49429bfb0 @words=["Bad", "bad", "piggy"]>
-    #
-    # @param string [String] the string to act on.
-    # @param options [Hash] a hash of options that includes `filter` and `regex`
-    #
-    #   ## `filter`
-    #   This a list of words to filter from the string. Useful if you want to remove *a*, **you**, and other common words.
-    #   Any words included in the filter must be **lowercase**.
-    #   defaults to an empty string
-    #
-    #   ## `regex`
-    #   The criteria used to split a string. It defaults to `/[^\p{Alpha}\-']+/`.
-    #
-    #
-    # @word_occurrences
-    # This is a hash of words and their occurrences. Occurrences count is not case sensitive.
-    #
-    # ## Example
-    #
-    #    "Hello hello" #=> { "hello" => 2 }
-    #
-    # @return [Hash] a hash map of words as keys and their occurrences as values.
-    #
-    #
-    # ## @word_lengths
-    # This is a hash of words and their lengths.
-    #
-    # ## Example
-    #
-    #    "Hello sir" #=> { "hello" => 5, "sir" => 3 }
-    #
-    # @return [Hash] a hash map of words as keys and their lengths as values.
-    #
     def initialize(string, options = {})
       @options = options
-      @words = string.scan(regex).reject { |word| filter.split.include? word.downcase }
-      @word_occurrences = words.each_with_object(Hash.new(0)) do |word, result|
-        result[word.downcase] += 1
-      end
-      @word_lengths = words.each_with_object({}) do |word, result|
-        result[word] ||= word.length
+      @char_count = string.length
+      @words = string.scan(regex).reject { |word| filter.include? word.downcase }
+      @word_occurrences = words.each_with_object(Hash.new(0)) do |word, hash|
+        hash[word.downcase] += 1
       end
+      @word_lengths = words.each_with_object({}) { |word, hash| hash[word] ||= word.length }
     end
-    # Returns the total word count.
-    #
-    # @return [Integer] total word count from `words` array size.
-    #
     def word_count
       words.size
     end
-    # Returns a  two dimensional array of the most occuring word(s)
-    # and its number of occurrences.
-    #
-    # In the event of a tie, all tied words are returned.
-    #
-    # @return [Array] see {#highest_ranking}
-    #
+    def unique_word_count
+      words.uniq.size
+    end
+    def average_chars_per_word
+      (char_count / word_count).round(2)
+    end
     def most_occurring_words
       highest_ranking word_occurrences
     end
-    # Returns a  two dimensional array of the longest word(s) and
-    # its length. In the event of a tie, all tied words are returned.
-    #
-    # @return [Array] see {#highest_ranking}
-    #
     def longest_words
       highest_ranking word_lengths
     end
-    # Returns a hash of word and their word density in percent.
-    #
-    # @returns [Hash] a hash map of words as keys and their density as values in percent.
-    #
     def word_density
       word_occurrences.each_with_object({}) do |(word, occ), hash|
-        hash[word] = percent_of_n(occ)
-      end.sort_by { |_, v| v }.reverse
+        hash[word] = percent_of(occ)
+      end.sort_by { |_, value| value }.reverse
     end
     private
-    # Takes a hashmap of the form {"foo" => 1, "bar" => 2} and returns an array
-    # containing the entries (as an array) with the highest number as a value.
-    #
-    # @param entries [Hash] a hash of entries to analyse
-    # @return [Array] a two dimentional array where each consists of a word its rank
-    #
-    # {http://codereview.stackexchange.com/a/47515/1563 See here}.
-    #
     def highest_ranking(entries)
-      entries.group_by { |word, occurrence| occurrence }.sort.last.last
+      entries.group_by { |word, value| value }.sort.last.last
     end
-    # Calculates the percentege of a word.
-    #
-    # @param n [Integer] the divisor.
-    # @returns [Float] a percentege of n based on {#word_count} rounded to two decimal places.
-    #
-    def percent_of_n(n)
-      ((n.to_f / word_count.to_f) * 100.0).round(2)
+    def percent_of(n)
+      (n.to_f / word_count.to_f * 100.0).round(2)
     end
     def regex
@@ -133,7 +55,11 @@ module WordsCounted
     end
     def filter
-      @options[:filter] || String.new
+      if filters = @options[:filter]
+        filters.split.collect { |word| word.downcase }
+      else
+        []
+      end
     end
   end
 end

data/lib/words_counted/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module WordsCounted
-  VERSION = "0.0.7"
+  VERSION = "0.0.8"
 end

data/spec/words_counted/counter_spec.rb CHANGED Viewed

@@ -9,6 +9,10 @@ module WordsCounted
         expect(counter.instance_variables).to include(:@options)
       end
+      it "sets @char_count" do
+        expect(counter.instance_variables).to include(:@char_count)
+      end
       it "sets @words" do
         expect(counter.instance_variables).to include(:@words)
       end
@@ -46,11 +50,21 @@ module WordsCounted
         expect(counter.words).to eq(%w[Bust 'em Them be Jim's bastards'])
       end
+      it "does not split on unicode chars" do
+        counter = Counter.new("São Paulo")
+        expect(counter.words).to eq(%w[São Paulo])
+      end
       it "filters words" do
         counter = Counter.new("That was magnificent, Trevor.", filter: "magnificent")
         expect(counter.words).to eq(%w[That was Trevor])
       end
+      it "filters words when passed in in uppercase" do
+        counter = Counter.new("That was magnificent, Trevor.", filter: "Magnificent")
+        expect(counter.words).to eq(%w[That was Trevor])
+      end
       it "splits words based on regex" do
         counter = Counter.new("I am 007.", regex: /[\p{Alnum}\-']+/)
         expect(counter.words).to eq(["I", "am", "007"])
@@ -117,5 +131,23 @@ module WordsCounted
         expect(counter.word_density).to eq([["major", 50.0], ["mean", 10.0], ["i", 10.0], ["was", 10.0], ["name", 10.0], ["his", 10.0]])
       end
     end
+    describe ".char_count" do
+      it "returns the number of chars in the passed in string" do
+        expect(counter.char_count).to eq(66)
+      end
+    end
+    describe ".average_chars_per_word" do
+      it "returns the average number of chars per word" do
+        expect(counter.average_chars_per_word).to eq(4)
+      end
+    end
+    describe ".unique_word_count" do
+      it "returns the number of unique words" do
+        expect(counter.unique_word_count).to eq(13)
+      end
+    end
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: words_counted
 version: !ruby/object:Gem::Version
-  version: 0.0.7
+  version: 0.0.8
 platform: ruby
 authors:
 - Mohamad El-Husseini
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-01 00:00:00.000000000 Z
+date: 2014-05-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler