words_counted 0.0.7 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b9b07e20d2f4adda71cca50cab03b13b8ed7655e
4
- data.tar.gz: f960b67f29488004565aaea9ca2fae20d01b17fe
3
+ metadata.gz: f755855668270d89fc16194ce006feb3f534bec3
4
+ data.tar.gz: 22bccb437e3105c3ba5d4f4bf563b5fa8757ac49
5
5
  SHA512:
6
- metadata.gz: 3a5b7d91f4d8d90956d82b9de7f0c96fd27b1db3fbc284e4de3506c8b9a250c91c34f5156ea6bcfde21a58cb044795d560ffc9f10d80f8ad96a0df5487bc1696
7
- data.tar.gz: 99403fb14b085a7b2fa4b3db80e3510d4d8ee9a6e63a8b398af1b1c0243f6c04f1ba27000297b7d3c10d9904736569240b9b935c04385177bc93f072ddb336eb
6
+ metadata.gz: 8e9222364a6ea859ed17a553b07558ff1fd9e8fb84b6d7c4775b023daf64210de1fc0b93511a6a21fb5afaada0897f8951809137b41641b44e52e29f1c0adeaa
7
+ data.tar.gz: 057299e3bd09f97b2dd75f815dec2b4a542a74e5c4512798301ce4cad8f9ba330b0fc1db5636143241f1e19142fce0b20e39f6aecd82c46ebaa43d61894b3e7c
data/README.md CHANGED
@@ -1,6 +1,8 @@
1
1
  # Words Counted
2
2
 
3
- Words Counted is a Ruby word counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things. You can also pass in your custom criteria for splitting strings in the form of a custom regexp.
3
+ Words Counted is a Ruby word (or anything--see custom regexp) counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things.
4
+
5
+ You can also pass in your custom criteria for splitting strings in the form of a custom regexp, which affords you a great deal of flexibility, whether you want to count words, numbers, or special characters.
4
6
 
5
7
  ### Features
6
8
 
@@ -13,6 +15,8 @@ Words Counted is a Ruby word counter and string analyser. It includes some handy
13
15
  7. Filters special characters but respects hyphens and apostrophes.
14
16
  8. Plays nicely with diacritics (utf and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`
15
17
  9. Customisable criteria. Pass in your own regexp rules to split strings if you prefer.
18
+ 10. Get `char_count` and `average_chars_per_word`.
19
+ 11. Get unique word count.
16
20
 
17
21
  See usage instructions for details on each feature.
18
22
 
@@ -32,7 +36,7 @@ Or install it yourself as:
32
36
 
33
37
  ## Usage
34
38
 
35
- Create an instance of `Counter` and pass in a string and an optional filter string.
39
+ Create an instance of `Counter` and pass in a string and an optional filter and/or regexp.
36
40
 
37
41
  ```ruby
38
42
  counter = WordsCounted::Counter.new(
@@ -40,9 +44,11 @@ counter = WordsCounted::Counter.new(
40
44
  )
41
45
  ```
42
46
 
47
+ ### API
48
+
43
49
  #### `.word_count`
44
50
 
45
- Returns the word count of a given string. The word count includes only alpha characters. Hyphenated and words with apostrophes are considered a single word.
51
+ Returns the word count of a given string. The word count includes only alpha characters. Hyphenated and words with apostrophes are considered a single word. You can pass in your own regexp if this is not desired behaviour.
46
52
 
47
53
  ```ruby
48
54
  counter.word_count #=> 15
@@ -159,9 +165,36 @@ counter.word_density
159
165
  #
160
166
  ```
161
167
 
168
+ #### `.char_count`
169
+
170
+ Returns the string's character count.
171
+
172
+ ```ruby
173
+ counter.char_count
174
+ #=> 76
175
+ ```
176
+
177
+ #### `.average_chars_per_word`
178
+
179
+ Returns the average character count per word.
180
+
181
+ ```ruby
182
+ counter.average_chars_per_word
183
+ #=> 4
184
+ ```
185
+
186
+ #### `.unique_word_count`
187
+
188
+ Returns the count of unique words in the string.
189
+
190
+ ```ruby
191
+ counter.unique_word_count
192
+ #=> 13
193
+ ```
194
+
162
195
  ## Filtering
163
196
 
164
- You can pass in a space-delimited word list to filter words that you don't want to count. Filter words should be *lowercase*. The filter will remove both uppercase and lowercase variants of the word.
197
+ You can pass in a *space-delimited* word list to filter words that you don't want to count. The filter will remove both uppercase and lowercase variants of the word.
165
198
 
166
199
  ```ruby
167
200
  WordsCounted::Counter.new(
@@ -179,7 +212,9 @@ Defining words is tricky business. Out of the box, the default regexp accounts f
179
212
  /[\p{Alpha}\-']+/
180
213
  ```
181
214
 
182
- If you prefer, you can pass in your own criteria in the form of a Ruby regexp to split your string as desired. For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
215
+ But maybe you don't want to count words? Well, count anything you want. What you count is only limited by your knowledge of regular expressions. Pass in your own criteria in the form of a Ruby regexp to split your string as desired.
216
+
217
+ For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
183
218
 
184
219
  ```ruby
185
220
  counter = WordsCounted::Counter.new("I am 007.", regex: /[\p{Alnum}\-']+/)
@@ -189,7 +224,7 @@ counter.words
189
224
 
190
225
  ## Gotchas
191
226
 
192
- A hyphen use in leu of an *em* or *en* dash will form part of the word and throw off the `word_occurences` algorithm.
227
+ A hyphen used in leu of an *em* or *en* dash will form part of the word and throw off the `word_occurences` algorithm.
193
228
 
194
229
  ```ruby
195
230
  counter = WordsCounted::Counter.new("How do you do?-you are well, I see.")
@@ -213,18 +248,12 @@ counter.word_occurrences
213
248
 
214
249
  In this example, `-you` and `you` are counted as separate words. Writers should use the correct dash element, but this is not always the case.
215
250
 
216
- Another gotcha is that the default criteria does not count numbers as words.
217
-
218
- Remember that you can pass in your own regexp if the default solution does not fit your needs.
251
+ Another gotcha is that the default criteria does not count numbers as words. Remember that you can pass in your own regexp if the default solution does not fit your needs.
219
252
 
220
- ## To do
253
+ ## Road Map
221
254
 
222
- 1. Add paragraph counter.
223
- 2. Add ability to open files or URLs.
224
- 3. A character counter, with spaces, and without spaces.
225
- 4. A sentence counter.
226
- 5. Average words in a sentence.
227
- 6. Average sentence chars.
255
+ 1. Add ability to open files or URLs.
256
+ 2. Add paragraph, sentence, average words per sentence, and average sentence chars counters.
228
257
 
229
258
  #### Ability to open files or urls
230
259
 
@@ -1,131 +1,53 @@
1
1
  module WordsCounted
2
-
3
- # Represents a Counter object.
4
- #
5
2
  class Counter
6
- # @!words [Array] an array of words resulting from the string passed to the initializer.
7
- # @!word_occurrences [Hash] an hash of words as keys and their occurrences as values.
8
- # @!word_lengths [Hash] an hash of words as keys and their lengths as values.
9
- attr_reader :words, :word_occurrences, :word_lengths
3
+ attr_reader :words, :word_occurrences, :word_lengths, :char_count
10
4
 
11
- # This is the criteria for defining words.
12
- #
13
- # Words are alpha characters and can include hyphens and apostrophes.
14
- #
15
5
  WORD_REGEX = /[\p{Alpha}\-']+/
16
6
 
17
- # Initializes an instance of Counter and splits a given string into an array of words.
18
- #
19
- # ## @words
20
- # This is the array of words that results from the string passed in. For example:
21
- #
22
- # Counter.new("Bad, bad, piggy!")
23
- # => #<WordsCounted::Counter:0x007fd49429bfb0 @words=["Bad", "bad", "piggy"]>
24
- #
25
- # @param string [String] the string to act on.
26
- # @param options [Hash] a hash of options that includes `filter` and `regex`
27
- #
28
- # ## `filter`
29
- # This a list of words to filter from the string. Useful if you want to remove *a*, **you**, and other common words.
30
- # Any words included in the filter must be **lowercase**.
31
- # defaults to an empty string
32
- #
33
- # ## `regex`
34
- # The criteria used to split a string. It defaults to `/[^\p{Alpha}\-']+/`.
35
- #
36
- #
37
- # @word_occurrences
38
- # This is a hash of words and their occurrences. Occurrences count is not case sensitive.
39
- #
40
- # ## Example
41
- #
42
- # "Hello hello" #=> { "hello" => 2 }
43
- #
44
- # @return [Hash] a hash map of words as keys and their occurrences as values.
45
- #
46
- #
47
- # ## @word_lengths
48
- # This is a hash of words and their lengths.
49
- #
50
- # ## Example
51
- #
52
- # "Hello sir" #=> { "hello" => 5, "sir" => 3 }
53
- #
54
- # @return [Hash] a hash map of words as keys and their lengths as values.
55
- #
56
7
  def initialize(string, options = {})
57
8
  @options = options
58
-
59
- @words = string.scan(regex).reject { |word| filter.split.include? word.downcase }
60
-
61
- @word_occurrences = words.each_with_object(Hash.new(0)) do |word, result|
62
- result[word.downcase] += 1
63
- end
64
-
65
- @word_lengths = words.each_with_object({}) do |word, result|
66
- result[word] ||= word.length
9
+ @char_count = string.length
10
+ @words = string.scan(regex).reject { |word| filter.include? word.downcase }
11
+ @word_occurrences = words.each_with_object(Hash.new(0)) do |word, hash|
12
+ hash[word.downcase] += 1
67
13
  end
14
+ @word_lengths = words.each_with_object({}) { |word, hash| hash[word] ||= word.length }
68
15
  end
69
16
 
70
- # Returns the total word count.
71
- #
72
- # @return [Integer] total word count from `words` array size.
73
- #
74
17
  def word_count
75
18
  words.size
76
19
  end
77
20
 
78
- # Returns a two dimensional array of the most occuring word(s)
79
- # and its number of occurrences.
80
- #
81
- # In the event of a tie, all tied words are returned.
82
- #
83
- # @return [Array] see {#highest_ranking}
84
- #
21
+ def unique_word_count
22
+ words.uniq.size
23
+ end
24
+
25
+ def average_chars_per_word
26
+ (char_count / word_count).round(2)
27
+ end
28
+
85
29
  def most_occurring_words
86
30
  highest_ranking word_occurrences
87
31
  end
88
32
 
89
- # Returns a two dimensional array of the longest word(s) and
90
- # its length. In the event of a tie, all tied words are returned.
91
- #
92
- # @return [Array] see {#highest_ranking}
93
- #
94
33
  def longest_words
95
34
  highest_ranking word_lengths
96
35
  end
97
36
 
98
- # Returns a hash of word and their word density in percent.
99
- #
100
- # @returns [Hash] a hash map of words as keys and their density as values in percent.
101
- #
102
37
  def word_density
103
38
  word_occurrences.each_with_object({}) do |(word, occ), hash|
104
- hash[word] = percent_of_n(occ)
105
- end.sort_by { |_, v| v }.reverse
39
+ hash[word] = percent_of(occ)
40
+ end.sort_by { |_, value| value }.reverse
106
41
  end
107
42
 
108
43
  private
109
44
 
110
- # Takes a hashmap of the form {"foo" => 1, "bar" => 2} and returns an array
111
- # containing the entries (as an array) with the highest number as a value.
112
- #
113
- # @param entries [Hash] a hash of entries to analyse
114
- # @return [Array] a two dimentional array where each consists of a word its rank
115
- #
116
- # {http://codereview.stackexchange.com/a/47515/1563 See here}.
117
- #
118
45
  def highest_ranking(entries)
119
- entries.group_by { |word, occurrence| occurrence }.sort.last.last
46
+ entries.group_by { |word, value| value }.sort.last.last
120
47
  end
121
48
 
122
- # Calculates the percentege of a word.
123
- #
124
- # @param n [Integer] the divisor.
125
- # @returns [Float] a percentege of n based on {#word_count} rounded to two decimal places.
126
- #
127
- def percent_of_n(n)
128
- ((n.to_f / word_count.to_f) * 100.0).round(2)
49
+ def percent_of(n)
50
+ (n.to_f / word_count.to_f * 100.0).round(2)
129
51
  end
130
52
 
131
53
  def regex
@@ -133,7 +55,11 @@ module WordsCounted
133
55
  end
134
56
 
135
57
  def filter
136
- @options[:filter] || String.new
58
+ if filters = @options[:filter]
59
+ filters.split.collect { |word| word.downcase }
60
+ else
61
+ []
62
+ end
137
63
  end
138
64
  end
139
65
  end
@@ -1,3 +1,3 @@
1
1
  module WordsCounted
2
- VERSION = "0.0.7"
2
+ VERSION = "0.0.8"
3
3
  end
@@ -9,6 +9,10 @@ module WordsCounted
9
9
  expect(counter.instance_variables).to include(:@options)
10
10
  end
11
11
 
12
+ it "sets @char_count" do
13
+ expect(counter.instance_variables).to include(:@char_count)
14
+ end
15
+
12
16
  it "sets @words" do
13
17
  expect(counter.instance_variables).to include(:@words)
14
18
  end
@@ -46,11 +50,21 @@ module WordsCounted
46
50
  expect(counter.words).to eq(%w[Bust 'em Them be Jim's bastards'])
47
51
  end
48
52
 
53
+ it "does not split on unicode chars" do
54
+ counter = Counter.new("São Paulo")
55
+ expect(counter.words).to eq(%w[São Paulo])
56
+ end
57
+
49
58
  it "filters words" do
50
59
  counter = Counter.new("That was magnificent, Trevor.", filter: "magnificent")
51
60
  expect(counter.words).to eq(%w[That was Trevor])
52
61
  end
53
62
 
63
+ it "filters words when passed in in uppercase" do
64
+ counter = Counter.new("That was magnificent, Trevor.", filter: "Magnificent")
65
+ expect(counter.words).to eq(%w[That was Trevor])
66
+ end
67
+
54
68
  it "splits words based on regex" do
55
69
  counter = Counter.new("I am 007.", regex: /[\p{Alnum}\-']+/)
56
70
  expect(counter.words).to eq(["I", "am", "007"])
@@ -117,5 +131,23 @@ module WordsCounted
117
131
  expect(counter.word_density).to eq([["major", 50.0], ["mean", 10.0], ["i", 10.0], ["was", 10.0], ["name", 10.0], ["his", 10.0]])
118
132
  end
119
133
  end
134
+
135
+ describe ".char_count" do
136
+ it "returns the number of chars in the passed in string" do
137
+ expect(counter.char_count).to eq(66)
138
+ end
139
+ end
140
+
141
+ describe ".average_chars_per_word" do
142
+ it "returns the average number of chars per word" do
143
+ expect(counter.average_chars_per_word).to eq(4)
144
+ end
145
+ end
146
+
147
+ describe ".unique_word_count" do
148
+ it "returns the number of unique words" do
149
+ expect(counter.unique_word_count).to eq(13)
150
+ end
151
+ end
120
152
  end
121
153
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: words_counted
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.7
4
+ version: 0.0.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mohamad El-Husseini
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-01 00:00:00.000000000 Z
11
+ date: 2014-05-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler