words_counted 0.0.4 → 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a74d7a0ee0210b034e5babbe9f90f11cd0c5db8b
4
- data.tar.gz: f1852e2f1f728b7a22c519d43e670078edc61509
3
+ metadata.gz: 9847725d713bc20dd1b66d86321c8089e33276bf
4
+ data.tar.gz: f847a70bf6e008527606f944833979bf952a330e
5
5
  SHA512:
6
- metadata.gz: b8edcbac512ac3edaf9dfc1a28a66e55e34e7ca33c9db57688c8c5658c0f5b514c1585316d7657389ce3ed40c1b91c0dd58aefca24c2c0b1d7ab91736cb30f80
7
- data.tar.gz: 5019a1db7d42ef06068e24e666efe90f54bc7ac3821bbb9b2232506644712fa985592dfcdcf6a3310945ec940002a6b037665685aebcb645052262accb2ccd86
6
+ metadata.gz: d5922f2b471ea4bc60650a77972c289e516dae19b5ba4e59ceacf669df85442b775361a43ea9c8765270b005e2a8dfd1db39dbd4b85e29db142a3c064cabaff7
7
+ data.tar.gz: 2ab1369a5dc2b063c749242339e93d7a5f5424a9bae489fd6b649406834e7e05f18af7a598c797928f256d73eba532db3e1ebde95cbf5798b58b9e32599a6f8f
data/.yardopts CHANGED
@@ -1,2 +1,3 @@
1
1
  --title 'Word Counter for Ruby'
2
- --private
2
+ --private
3
+ --markup markdown
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Words Counted
2
2
 
3
- This Ruby gem is a word counter that includes some handy utility methods. It lets you send in a string of text and count the number of words, get the words sorted by number occurrences, get the highest occurring words, and few more things.
3
+ Words Counted is a Ruby word counter and string analyser. It includes some handy utility methods that go beyond word counting. You can use this gem to get word desnity, words and their number of occurrences, the highest occurring words, and few more things. You can also pass in your custom criteria for splitting strings in the form of a custom regexp.
4
4
 
5
5
  ### Features
6
6
 
@@ -11,6 +11,8 @@ This Ruby gem is a word counter that includes some handy utility methods. It let
11
11
  5. Get the longest word(s) and its length.
12
12
  6. Ability to filter out words from the count. Useful if you don't want to count `a`, `the`, etc...
13
13
  7. Filters special characters but respects hyphens and apostrophes.
14
+ 8. Plays nicely with diacritics (utf and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`
15
+ 9. Customisable criteria. Pass in your own regexp rules to split strings if you prefer.
14
16
 
15
17
  See usage instructions for details on each feature.
16
18
 
@@ -132,13 +134,55 @@ counter.words
132
134
  #=> ["We", "are", "all", "in", "the", "gutter", "but", "some", "of", "us", "are", "looking", "at", "the", "stars"]
133
135
  ```
134
136
 
137
+ #### `.word_density`
138
+
139
+ Returns a two-dimentional array of words and their density.
140
+
141
+ ```ruby
142
+ counter.word_density
143
+ #
144
+ # [
145
+ # ["are", 13.33],
146
+ # ["the", 13.33],
147
+ # ["but", 6.67],
148
+ # ["us", 6.67],
149
+ # ["of", 6.67],
150
+ # ["some", 6.67],
151
+ # ["looking", 6.67],
152
+ # ["gutter", 6.67],
153
+ # ["at", 6.67],
154
+ # ["in", 6.67],
155
+ # ["all", 6.67],
156
+ # ["stars", 6.67],
157
+ # ["we", 6.67]
158
+ # ]
159
+ #
160
+ ```
161
+
135
162
  ## Filtering
136
163
 
137
164
  You can pass in a space-delimited word list to filter words that you don't want to count. Filter words should be *lowercase*. The filter will remove both uppercase and lowercase variants of the word.
138
165
 
139
166
  ```ruby
140
- WordsCounted::Counter.new("Magnificent! That was magnificent, Trevor.", "was magnificent")
141
- #<WordsCounted::Counter:0x007fd4949f99d8 @words=["That", "Trevor"]>
167
+ WordsCounted::Counter.new("Magnificent! That was magnificent, Trevor.", filter: "was magnificent")
168
+ counter.words
169
+ #=> ["That", "Trevor"]
170
+ ```
171
+
172
+ ## Passing in a Custom Regexp
173
+
174
+ Defining words is tricky business. Out of the box, the default regexp accounts for letters, hyphenated words, and apostrophes. This means `twenty-one` is treated as one word. So is `Mohamad's`.
175
+
176
+ ```ruby
177
+ /[^\p{Alpha}\-']+/
178
+ ```
179
+
180
+ If you prefer, you can pass in your own criteria in the form of a Ruby regexp to split your string as desired. For example, if you wanted to count numbers as words, you could pass the following regex instead of the default one.
181
+
182
+ ```ruby
183
+ counter = WordsCounted::Counter.new("I am 007.", regex: /[^\p{Alnum}\-']+/)
184
+ counter.words
185
+ => ["I", "am", "007"]
142
186
  ```
143
187
 
144
188
  ## Gotchas
@@ -167,6 +211,37 @@ counter.word_occurrences
167
211
 
168
212
  In this example, `-you` and `you` are counted as separate words. Writers should use the correct dash element, but this is not always the case.
169
213
 
214
+ The default criteria does not count numbers as words.
215
+
216
+ ## To do
217
+
218
+ 1. Add paragraph counter.
219
+ 2. Add ability to open files or URLs.
220
+ 3. A character counter, with spaces, and without spaces.
221
+ 4. A sentence counter.
222
+ 5. Average words in a sentence.
223
+ 6. Average sentence chars.
224
+
225
+ #### Ability to open files or urls
226
+
227
+ Maybe I can some class methods to open the file and init the counter class.
228
+
229
+ ```ruby
230
+ def self.count_from_url
231
+ new # open url and send string here after removing html
232
+ end
233
+
234
+ def self.from_file
235
+ new # open file and send string here.
236
+ end
237
+ ```
238
+
239
+ ## But wait... wait a minute...
240
+
241
+ #### Isn't it better to write this in JavaScript?
242
+
243
+ ![http://stream1.gifsoup.com/view3/1290449/picard-facepalm-o.gif][Picard face palm]
244
+
170
245
  ## About
171
246
 
172
247
  Originally I wrote this program for a code challenge. My initial implementation was decent, but it could have been better. Thanks to [Dave Yarwood](http://codereview.stackexchange.com/a/47515/1563) for helping me improve my code. Some of this code is based on his recommendations. You can find the original implementation as well as the code review on [Code Review](http://codereview.stackexchange.com/questions/46105/a-ruby-string-analyser).
@@ -1,46 +1,78 @@
1
1
  module WordsCounted
2
+
3
+ # Represents a Counter object.
4
+ #
2
5
  class Counter
3
6
  # @!words [Array] an array of words resulting from the string passed to the initializer.
4
- attr_reader :words
7
+ # @!word_occurrences [Hash] an hash of words as keys and their occurrences as values.
8
+ # @!word_lengths [Hash] an hash of words as keys and their lengths as values.
9
+ attr_reader :words, :word_occurrences, :word_lengths
5
10
 
6
11
  # This is the criteria for defining words.
7
12
  #
8
13
  # Words are alpha characters and can include hyphens and apostrophes.
14
+ #
9
15
  WORD_REGEX = /[^\p{Alpha}\-']+/
10
16
 
11
17
  # Initializes an instance of Counter and splits a given string into an array of words.
12
18
  #
13
- # Counter.new("Bad, bad, piggy!")
14
- # => #<WordsCounted::Counter:0x007fd49429bfb0 @words=["Bad", "bad", "piggy"]>
19
+ # ## @words
20
+ # This is the array of words that results from the string passed in. For example:
21
+ #
22
+ # Counter.new("Bad, bad, piggy!")
23
+ # => #<WordsCounted::Counter:0x007fd49429bfb0 @words=["Bad", "bad", "piggy"]>
15
24
  #
16
25
  # @param string [String] the string to act on.
17
- # @param filter [String] a string of words to filter from the string to act on.
26
+ # @param options [Hash] a hash of options that includes `filter` and `regex`
18
27
  #
19
- def initialize(string, filter = "")
20
- @words = string.split(WORD_REGEX).reject { |word| filter.split.include? word.downcase }
21
- end
22
-
23
- # Returns the total word count.
28
+ # ## `filter`
29
+ # This a list of words to filter from the string. Useful if you want to remove *a*, **you**, and other common words.
30
+ # Any words included in the filter must be **lowercase**.
31
+ # defaults to an empty string
32
+ #
33
+ # ## `regex`
34
+ # The criteria used to split a string. It defaults to `/[^\p{Alpha}\-']+/`.
24
35
  #
25
- def word_count
26
- words.size
27
- end
28
-
29
- # Returns a hash of words and their occurrences.
30
- # Occurrences count is not case sensitive:
31
36
  #
32
- # `"Hello hello" #=> { "hello" => 2 }`
37
+ # @word_occurrences
38
+ # This is a hash of words and their occurrences. Occurrences count is not case sensitive.
33
39
  #
34
- # @return [Hash] the resulting hash of words (keys) and their occurrences (values).
40
+ # ## Example
35
41
  #
36
- def word_occurrences
37
- @occurrences ||= words.each_with_object(Hash.new(0)) { |word, result| result[word.downcase] += 1 }
42
+ # "Hello hello" #=> { "hello" => 2 }
43
+ #
44
+ # @return [Hash] a hash map of words as keys and their occurrences as values.
45
+ #
46
+ #
47
+ # ## @word_lengths
48
+ # This is a hash of words and their lengths.
49
+ #
50
+ # ## Example
51
+ #
52
+ # "Hello sir" #=> { "hello" => 5, "sir" => 3 }
53
+ #
54
+ # @return [Hash] a hash map of words as keys and their lengths as values.
55
+ #
56
+ def initialize(string, options = {})
57
+ @options = options
58
+
59
+ @words = string.split(regex).reject { |word| filter.split.include? word.downcase }
60
+
61
+ @word_occurrences = words.each_with_object(Hash.new(0)) do |word, result|
62
+ result[word.downcase] += 1
63
+ end
64
+
65
+ @word_lengths = words.each_with_object({}) do |word, result|
66
+ result[word] ||= word.length
67
+ end
38
68
  end
39
69
 
40
- # Returns a hash of words and their lengths.
70
+ # Returns the total word count.
41
71
  #
42
- def word_lengths
43
- @lengths ||= words.each_with_object({}) { |word, result| result[word] ||= word.length }
72
+ # @return [Integer] total word count from `words` array size.
73
+ #
74
+ def word_count
75
+ words.size
44
76
  end
45
77
 
46
78
  # Returns a two dimensional array of the most occuring word(s)
@@ -48,30 +80,58 @@ module WordsCounted
48
80
  #
49
81
  # In the event of a tie, all tied words are returned.
50
82
  #
83
+ # @return [Array] see {#highest_ranking}
84
+ #
51
85
  def most_occurring_words
52
86
  highest_ranking word_occurrences
53
87
  end
54
88
 
55
89
  # Returns a two dimensional array of the longest word(s) and
56
- # its length.
90
+ # its length. In the event of a tie, all tied words are returned.
57
91
  #
58
- # In the event of a tie, all tied words are returned.
92
+ # @return [Array] see {#highest_ranking}
59
93
  #
60
94
  def longest_words
61
95
  highest_ranking word_lengths
62
96
  end
63
97
 
98
+ # Returns a hash of word and their word density in percent.
99
+ #
100
+ # @returns [Hash] a hash map of words as keys and their density as values in percent.
101
+ #
102
+ def word_density
103
+ word_occurrences.each_with_object({}) { |(word, occ), hash| hash[word] = percent_of_n(occ) }.sort_by { |_, v| v }.reverse
104
+ end
105
+
64
106
  private
65
107
 
66
108
  # Takes a hashmap of the form {"foo" => 1, "bar" => 2} and returns an array
67
109
  # containing the entries (as an array) with the highest number as a value.
68
110
  #
69
- # {http://codereview.stackexchange.com/a/47515/1563 See here}.
70
- #
71
111
  # @param entries [Hash] a hash of entries to analyse
112
+ # @return [Array] a two dimentional array where each consists of a word its rank
113
+ #
114
+ # {http://codereview.stackexchange.com/a/47515/1563 See here}.
72
115
  #
73
116
  def highest_ranking(entries)
74
117
  entries.group_by { |word, occurrence| occurrence }.sort.last.last
75
118
  end
119
+
120
+ # Calculates the percentege of a word.
121
+ #
122
+ # @param n [Integer] the divisor.
123
+ # @returns [Float] a percentege of n based on {#word_count} rounded to two decimal places.
124
+ #
125
+ def percent_of_n(n)
126
+ ((n.to_f / word_count.to_f) * 100.0).round(2)
127
+ end
128
+
129
+ def regex
130
+ @options[:regex] || WORD_REGEX
131
+ end
132
+
133
+ def filter
134
+ @options[:filter] || String.new
135
+ end
76
136
  end
77
137
  end
@@ -1,3 +1,3 @@
1
1
  module WordsCounted
2
- VERSION = "0.0.4"
2
+ VERSION = "0.0.5"
3
3
  end
@@ -2,10 +2,23 @@ require "spec_helper"
2
2
 
3
3
  module WordsCounted
4
4
  describe Counter do
5
+ let(:counter) { Counter.new("We are all in the gutter, but some of us are looking at the stars.") }
5
6
 
6
- describe ".words" do
7
- let(:counter) { Counter.new("We are all in the gutter, but some of us are looking at the stars.") }
7
+ describe "#initialize" do
8
+ it "sets @words" do
9
+ expect(counter.instance_variables).to include(:@words)
10
+ end
11
+
12
+ it "sets @word_occurrences" do
13
+ expect(counter.instance_variables).to include(:@word_occurrences)
14
+ end
8
15
 
16
+ it "sets @word_lengths" do
17
+ expect(counter.instance_variables).to include(:@word_lengths)
18
+ end
19
+ end
20
+
21
+ describe ".words" do
9
22
  it "returns an array" do
10
23
  expect(counter.words).to be_a(Array)
11
24
  end
@@ -30,65 +43,75 @@ module WordsCounted
30
43
  end
31
44
 
32
45
  it "filters words" do
33
- counter = Counter.new("That was magnificent, Trevor.", "magnificent")
46
+ counter = Counter.new("That was magnificent, Trevor.", filter: "magnificent")
34
47
  expect(counter.words).to eq(%w[That was Trevor])
35
48
  end
49
+
50
+ it "splits words based on regex" do
51
+ counter = Counter.new("I am 007.", regex: /[^\p{Alnum}\-']+/)
52
+ expect(counter.words).to eq(["I", "am", "007"])
53
+ end
36
54
  end
37
55
 
38
56
  describe ".word_count" do
39
- let(:counter) { Counter.new("In that case I'll take measures to secure you, woman!") }
40
-
41
57
  it "returns the correct word count" do
42
- expect(counter.word_count).to eq(10)
58
+ expect(counter.word_count).to eq(15)
43
59
  end
44
60
  end
45
61
 
46
62
  describe ".word_occurrences" do
47
- let(:counter) { Counter.new("Bad, bad, piggy!") }
48
-
49
63
  it "returns a hash" do
50
64
  expect(counter.word_occurrences).to be_a(Hash)
51
65
  end
52
66
 
53
67
  it "treats capitalized words as the same word" do
68
+ counter = Counter.new("Bad, bad, piggy!")
54
69
  expect(counter.word_occurrences).to eq({ "bad" => 2, "piggy" => 1 })
55
70
  end
56
71
  end
57
72
 
58
73
  describe ".most_occurring_words" do
59
- let(:counter) { Counter.new("One should always be in love. That is the reason one should never marry.") }
60
-
61
74
  it "returns an array" do
62
75
  expect(counter.most_occurring_words).to be_a(Array)
63
76
  end
64
77
 
65
78
  it "returns highest occuring words" do
66
- expect(counter.most_occurring_words).to eq([["one", 2],["should", 2]])
79
+ counter = Counter.new("Orange orange Apple apple banana")
80
+ expect(counter.most_occurring_words).to eq([["orange", 2],["apple", 2]])
67
81
  end
68
82
  end
69
83
 
70
84
  describe '.word_lengths' do
71
- let(:counter) { Counter.new("One two three.") }
72
-
73
85
  it "returns a hash" do
74
86
  expect(counter.word_lengths).to be_a(Hash)
75
87
  end
76
88
 
77
89
  it "returns a hash of word lengths" do
90
+ counter = Counter.new("One two three.")
78
91
  expect(counter.word_lengths).to eq({ "One" => 3, "two" => 3, "three" => 5 })
79
92
  end
80
93
  end
81
94
 
82
95
  describe ".longest_words" do
83
- let(:counter) { Counter.new("Those whom the gods love grow young.") }
84
-
85
96
  it "returns an array" do
86
97
  expect(counter.longest_words).to be_a(Array)
87
98
  end
88
99
 
89
100
  it "returns the longest words" do
101
+ counter = Counter.new("Those whom the gods love grow young.")
90
102
  expect(counter.longest_words).to eq([["Those", 5],["young", 5]])
91
103
  end
92
104
  end
105
+
106
+ describe ".word_density" do
107
+ it "returns a hash" do
108
+ expect(counter.word_density).to be_a(Array)
109
+ end
110
+
111
+ it "returns words and their density in percent" do
112
+ counter = Counter.new("His name was major, I mean, Major Major Major Major.")
113
+ expect(counter.word_density).to eq([["major", 50.0], ["mean", 10.0], ["i", 10.0], ["was", 10.0], ["name", 10.0], ["his", 10.0]])
114
+ end
115
+ end
93
116
  end
94
117
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: words_counted
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mohamad El-Husseini
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-04-30 00:00:00.000000000 Z
11
+ date: 2014-05-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler