words_counted 0.1.5 → 1.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/.gitignore +1 -0
- data/.hound.yml +2 -0
- data/.ruby-style.yml +2 -0
- data/.ruby-version +1 -0
- data/.travis.yml +9 -0
- data/.yardopts +3 -2
- data/CHANGELOG.md +29 -0
- data/README.md +146 -189
- data/lib/refinements/hash_refinements.rb +14 -0
- data/lib/words_counted/counter.rb +113 -72
- data/lib/words_counted/deprecated.rb +78 -0
- data/lib/words_counted/tokeniser.rb +163 -0
- data/lib/words_counted/version.rb +1 -1
- data/lib/words_counted.rb +31 -4
- data/spec/words_counted/counter_spec.rb +49 -204
- data/spec/words_counted/deprecated_spec.rb +99 -0
- data/spec/words_counted/tokeniser_spec.rb +133 -0
- data/spec/words_counted_spec.rb +34 -0
- data/words_counted.gemspec +2 -2
- metadata +25 -12
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: a248654f9f76e28bde0f54993a5c5c87504acffed42b1531acc9de7f385f0696
|
4
|
+
data.tar.gz: c057a7ecb20d7989651b6667f39d16820734e63dd751a0182406f268ecf0f347
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2c4a5028624393434586c7570e8a6c98785c6cedfc3a6f5c07b7fa9b8aba2880ddf847be8779f623df8e36becb8e148aeaabfae822dcc4f0c9b1db414f8c7916
|
7
|
+
data.tar.gz: e115d757c34480e9e7425db94f6c78a035b4464c69946aa31cbb45ea28f963dc1088a1617269b506001669753b2725abf4f0b708303ced59aa5c59cb1658096c
|
data/.gitignore
CHANGED
data/.hound.yml
ADDED
data/.ruby-style.yml
ADDED
data/.ruby-version
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
3.0.1
|
data/.travis.yml
ADDED
data/.yardopts
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,32 @@
|
|
1
|
+
## Version 1.0.3
|
2
|
+
|
3
|
+
1. Adds support for Ruby 3.0.0.
|
4
|
+
2. Improve documentation and adds newer configs to Travis CI and Hound.
|
5
|
+
|
6
|
+
## Version 1.0
|
7
|
+
|
8
|
+
This version brings lots of improvements to code organisation. The tokeniser has been extracted into its own class. All methods in `Counter` have either renamed or deprecated. Deprecated methods and their tests have moved into their own modules. Using them will trigger warnings with upgrade instructions outlined below.
|
9
|
+
|
10
|
+
1. Extracted tokenisation behaviour from `Counter` into a `Tokeniser` class.
|
11
|
+
2. Deprecated all methods that have `word` in their name. Most are renamed such that `word` became `token`. They will be removed in version 1.1.
|
12
|
+
- Deprecated `word_count` in favor of `token_count`
|
13
|
+
- Deprecated `unique_word_count` in favor of `unique_token_count`
|
14
|
+
- Deprecated `word_occurrences` and `sorted_word_occurrences` in favor of `token_frequency`
|
15
|
+
- Deprecated `word_lengths` and `sorted_word_lengths` in favor of `token_lenghts`
|
16
|
+
- Deprecated `word_density` in favor of `token_density`
|
17
|
+
- Deprecated `most_occurring_words` in favor of `most_frequent_tokens`
|
18
|
+
- Deprecated `longest_words` in favor of `longest_tokens`
|
19
|
+
- Deprecated `average_chars_per_word` in favor of `average_chars_per_token`
|
20
|
+
- Deprecated `count`. Use `Array#count` instead.
|
21
|
+
3. `token_lengths`, which replaces `word_lengths` returns a sorted two-dimensional array instead of a hash. It behaves exactly like `sorted_word_lengths` which has been deprecated. Use `token_lengths.to_h` for old behaviour.
|
22
|
+
4. `token_frequency`, which replaces `word_occurences` returns a sorted two-dimensional array instead of a hash. It behaves like `sorted_word_occurrences` which has been deprecated. Use `token_frequency.to_h` for old behaviour.
|
23
|
+
5. `token_density`, which replaces `word_density`, returns a decimal with a precision of 2, not a percent. Use `token_density * 100` for old behaviour.
|
24
|
+
6. Add a refinement to Hash under `lib/refinements/hash_refinements.rb` to quickly sort by descending value.
|
25
|
+
7. Extracted all deprecated methods to their own module, and their tests to their own spec file.
|
26
|
+
8. Added a base `words_counted_spec.rb` and moved `.from_file` test to the new file.
|
27
|
+
9. Added Travis continuous integration.
|
28
|
+
10. Add documentation to the code.
|
29
|
+
|
1
30
|
## Version 0.1.5
|
2
31
|
|
3
32
|
1. Removed `to_f` from the dividend in `average_chars_per_word` and `word_densities`. The divisor is a float, and dividing by a float returns a float.
|
data/README.md
CHANGED
@@ -1,36 +1,35 @@
|
|
1
1
|
# WordsCounted
|
2
2
|
|
3
|
-
|
3
|
+
> We are all in the gutter, but some of us are looking at the stars.
|
4
|
+
>
|
5
|
+
> -- Oscar Wilde
|
6
|
+
|
7
|
+
WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.
|
8
|
+
|
9
|
+
**Are you using WordsCounted to do something interesting?** Please [tell me about it][8].
|
4
10
|
|
5
11
|
<a href="http://badge.fury.io/rb/words_counted">
|
6
12
|
<img src="https://badge.fury.io/rb/words_counted@2x.png" alt="Gem Version" height="18">
|
7
13
|
</a>
|
8
14
|
|
15
|
+
[RubyDoc documentation][7].
|
16
|
+
|
9
17
|
### Demo
|
10
18
|
|
11
|
-
Visit [
|
19
|
+
Visit [this website][4] for one example of what you can do with WordsCounted.
|
12
20
|
|
13
21
|
### Features
|
14
22
|
|
15
|
-
*
|
16
|
-
*
|
17
|
-
*
|
18
|
-
*
|
19
|
-
*
|
20
|
-
*
|
21
|
-
|
22
|
-
|
23
|
-
* The longest word(s) and its length
|
24
|
-
* The most occurring word(s) and its number of occurrences.
|
25
|
-
* Count invividual strings for occurrences.
|
26
|
-
* A flexible way to exclude words (or anything) from the count. You can pass a **string**, a **regexp**, an **array**, or a **lambda**.
|
27
|
-
* Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
|
28
|
-
* Filters special characters but respects hyphens and apostrophes.
|
29
|
-
* Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as `["São", "Paulo"]` and not `["S", "", "o", "Paulo"]`.
|
23
|
+
* Out of the box, get the following data from any string or readable file, or URL:
|
24
|
+
* Token count and unique token count
|
25
|
+
* Token densities, frequencies, and lengths
|
26
|
+
* Char count and average chars per token
|
27
|
+
* The longest tokens and their lengths
|
28
|
+
* The most frequent tokens and their frequencies.
|
29
|
+
* A flexible way to exclude tokens from the tokeniser. You can pass a **string**, **regexp**, **symbol**, **lambda**, or an **array** of any combination of those types for powerful tokenisation strategies.
|
30
|
+
* Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): *Bayrūt* is treated as `["Bayrūt"]` and not `["Bayr", "ū", "t"]`, for example.
|
30
31
|
* Opens and reads files. Pass in a file path or a url instead of a string.
|
31
32
|
|
32
|
-
See usage instructions for more details.
|
33
|
-
|
34
33
|
## Installation
|
35
34
|
|
36
35
|
Add this line to your application's Gemfile:
|
@@ -58,62 +57,70 @@ counter = WordsCounted.count(
|
|
58
57
|
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")
|
59
58
|
```
|
60
59
|
|
60
|
+
`.count` and `.from_file` are convenience methods that take an input, tokenise it, and return an instance of `WordsCounted::Counter` initialized with the tokens. The `WordsCounted::Tokeniser` and `WordsCounted::Counter` classes can be used alone, however.
|
61
|
+
|
61
62
|
## API
|
62
63
|
|
63
|
-
###
|
64
|
+
### WordsCounted
|
64
65
|
|
65
|
-
|
66
|
+
**`WordsCounted.count(input, options = {})`**
|
66
67
|
|
67
|
-
|
68
|
+
Tokenises input and initializes a `WordsCounted::Counter` object with the resulting tokens.
|
68
69
|
|
69
70
|
```ruby
|
70
71
|
counter = WordsCounted.count("Hello Beirut!")
|
71
72
|
````
|
72
73
|
|
73
|
-
Accepts two options: `exclude` and `regexp`. See [Excluding
|
74
|
+
Accepts two options: `exclude` and `regexp`. See [Excluding tokens from the analyser][5] and [Passing in a custom regexp][6] respectively.
|
74
75
|
|
75
|
-
|
76
|
+
**`WordsCounted.from_file(path, options = {})`**
|
76
77
|
|
77
|
-
|
78
|
+
Reads and tokenises a file, and initializes a `WordsCounted::Counter` object with the resulting tokens.
|
78
79
|
|
79
80
|
```ruby
|
80
|
-
counter = WordsCounted.
|
81
|
+
counter = WordsCounted.from_file("hello_beirut.txt")
|
81
82
|
````
|
82
83
|
|
83
|
-
Accepts the same options as
|
84
|
+
Accepts the same options as `.count`.
|
85
|
+
|
86
|
+
### Tokeniser
|
84
87
|
|
85
|
-
|
88
|
+
The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.
|
86
89
|
|
87
|
-
|
90
|
+
Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.
|
88
91
|
|
89
|
-
|
92
|
+
**`#tokenise([pattern: TOKEN_REGEXP, exclude: nil])`**
|
90
93
|
|
91
94
|
```ruby
|
92
|
-
|
95
|
+
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise
|
96
|
+
|
97
|
+
# With `exclude`
|
98
|
+
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")
|
99
|
+
|
100
|
+
# With `pattern`
|
101
|
+
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)
|
93
102
|
```
|
94
103
|
|
95
|
-
|
104
|
+
See [Excluding tokens from the analyser][5] and [Passing in a custom regexp][6] for more information.
|
96
105
|
|
97
|
-
|
106
|
+
### Counter
|
98
107
|
|
99
|
-
|
100
|
-
counter.word_occurrences
|
108
|
+
The `WordsCounted::Counter` class allows you to collect various statistics from an array of tokens.
|
101
109
|
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
}
|
110
|
+
**`#token_count`**
|
111
|
+
|
112
|
+
Returns the token count of a given string.
|
113
|
+
|
114
|
+
```ruby
|
115
|
+
counter.token_count #=> 15
|
109
116
|
```
|
110
117
|
|
111
|
-
|
118
|
+
**`#token_frequency`**
|
112
119
|
|
113
|
-
Returns a two
|
120
|
+
Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.
|
114
121
|
|
115
122
|
```ruby
|
116
|
-
counter.
|
123
|
+
counter.token_frequency
|
117
124
|
|
118
125
|
[
|
119
126
|
["the", 2],
|
@@ -124,38 +131,22 @@ counter.sorted_word_occurrences
|
|
124
131
|
]
|
125
132
|
```
|
126
133
|
|
127
|
-
|
128
|
-
|
129
|
-
Returns a two dimensional array of the most occurring word and its number of occurrences. In case there is a tie all tied words are returned.
|
130
|
-
|
131
|
-
```ruby
|
132
|
-
counter.most_occurring_words
|
133
|
-
|
134
|
-
[ ["are", 2], ["the", 2] ]
|
135
|
-
```
|
136
|
-
|
137
|
-
#### `.word_lengths`
|
134
|
+
**`#most_frequent_tokens`**
|
138
135
|
|
139
|
-
Returns
|
136
|
+
Returns a hash where each key-value pair is a token and its frequency.
|
140
137
|
|
141
138
|
```ruby
|
142
|
-
counter.
|
139
|
+
counter.most_frequent_tokens
|
143
140
|
|
144
|
-
{
|
145
|
-
"We" => 2,
|
146
|
-
"are" => 3,
|
147
|
-
"all" => 3,
|
148
|
-
# ...
|
149
|
-
"stars" => 5
|
150
|
-
}
|
141
|
+
{ "are" => 2, "the" => 2 }
|
151
142
|
```
|
152
143
|
|
153
|
-
|
144
|
+
**`#token_lengths`**
|
154
145
|
|
155
|
-
Returns a two
|
146
|
+
Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.
|
156
147
|
|
157
148
|
```ruby
|
158
|
-
counter.
|
149
|
+
counter.token_lengths
|
159
150
|
|
160
151
|
[
|
161
152
|
["looking", 7],
|
@@ -166,133 +157,121 @@ counter.sorted_word_lengths
|
|
166
157
|
]
|
167
158
|
```
|
168
159
|
|
169
|
-
|
170
|
-
|
171
|
-
Returns a two dimensional array of the longest word and its length. In case there is a tie all tied words are returned.
|
160
|
+
**`#longest_tokens`**
|
172
161
|
|
173
|
-
|
174
|
-
counter.longest_words
|
175
|
-
|
176
|
-
[ ["looking", 7] ]
|
177
|
-
```
|
178
|
-
|
179
|
-
#### `.words`
|
162
|
+
Returns a hash where each key-value pair is a token and its length.
|
180
163
|
|
181
|
-
Returns an array of words resulting from the string passed into the initialize method.
|
182
164
|
|
183
165
|
```ruby
|
184
|
-
counter.
|
185
|
-
|
166
|
+
counter.longest_tokens
|
167
|
+
|
168
|
+
{ "looking" => 7 }
|
186
169
|
```
|
187
170
|
|
188
|
-
|
171
|
+
**`#token_density([ precision: 2 ])`**
|
189
172
|
|
190
|
-
Returns a two-
|
173
|
+
Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a `precision` argument, which must be a float.
|
191
174
|
|
192
175
|
```ruby
|
193
|
-
counter.
|
176
|
+
counter.token_density
|
194
177
|
|
195
178
|
[
|
196
|
-
["are", 13
|
197
|
-
["the", 13
|
198
|
-
["but",
|
179
|
+
["are", 0.13],
|
180
|
+
["the", 0.13],
|
181
|
+
["but", 0.07 ],
|
199
182
|
# ...
|
200
|
-
["we",
|
183
|
+
["we", 0.07 ]
|
201
184
|
]
|
202
185
|
```
|
203
186
|
|
204
|
-
|
187
|
+
**`#char_count`**
|
205
188
|
|
206
|
-
Returns the
|
189
|
+
Returns the char count of tokens.
|
207
190
|
|
208
191
|
```ruby
|
209
|
-
counter.char_count
|
192
|
+
counter.char_count #=> 76
|
210
193
|
```
|
211
194
|
|
212
|
-
|
195
|
+
**`#average_chars_per_token([ precision: 2 ])`**
|
213
196
|
|
214
|
-
Returns the average
|
197
|
+
Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.
|
215
198
|
|
216
199
|
```ruby
|
217
|
-
counter.
|
200
|
+
counter.average_chars_per_token #=> 4
|
218
201
|
```
|
219
202
|
|
220
|
-
|
203
|
+
**`#uniq_token_count`**
|
221
204
|
|
222
|
-
Returns the
|
205
|
+
Returns the number of unique tokens.
|
223
206
|
|
224
207
|
```ruby
|
225
|
-
counter.
|
208
|
+
counter.uniq_token_count #=> 13
|
226
209
|
```
|
227
210
|
|
228
|
-
|
211
|
+
## Excluding tokens from the tokeniser
|
229
212
|
|
230
|
-
|
213
|
+
You can exclude anything you want from the input by passing the `exclude` option. The exclude option accepts a variety of filters and is extremely flexible.
|
231
214
|
|
232
|
-
|
233
|
-
|
234
|
-
|
215
|
+
1. A *space-delimited* string. The filter will normalise the string.
|
216
|
+
2. A regular expression.
|
217
|
+
3. A lambda.
|
218
|
+
4. A symbol that names a predicate method. For example `:odd?`.
|
219
|
+
5. An array of any combination of the above.
|
235
220
|
|
236
|
-
## Excluding words from the analyser
|
237
|
-
|
238
|
-
You can exclude anything you want from the string you want to analyse by passing in the `exclude` option. The exclude option accepts a variety of filters.
|
239
|
-
|
240
|
-
1. A *space-delimited* list of candidates. The filter will remove both uppercase and lowercase variants of the candidate when applicable. Useful for excluding *the*, *a*, and so on.
|
241
|
-
2. An array of string candidates. For example: `['a', 'the']`.
|
242
|
-
3. A regular expression.
|
243
|
-
4. A lambda.
|
244
|
-
|
245
|
-
#### Using a string
|
246
221
|
```ruby
|
247
|
-
|
248
|
-
|
222
|
+
tokeniser =
|
223
|
+
WordsCounted::Tokeniser.new(
|
224
|
+
"Magnificent! That was magnificent, Trevor."
|
225
|
+
)
|
226
|
+
|
227
|
+
# Using a string
|
228
|
+
tokeniser.tokenise(exclude: "was magnificent")
|
229
|
+
# => ["that", "trevor"]
|
230
|
+
|
231
|
+
# Using a regular expression
|
232
|
+
tokeniser.tokenise(exclude: /trevor/)
|
233
|
+
# => ["magnificent", "that", "was", "magnificent"]
|
234
|
+
|
235
|
+
# Using a lambda
|
236
|
+
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
|
237
|
+
# => ["magnificent", "that", "magnificent", "trevor"]
|
238
|
+
|
239
|
+
# Using symbol
|
240
|
+
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
|
241
|
+
tokeniser.tokenise(exclude: :ascii_only?)
|
242
|
+
# => ["محمد"]
|
243
|
+
|
244
|
+
# Using an array
|
245
|
+
tokeniser = WordsCounted::Tokeniser.new(
|
246
|
+
"Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
|
249
247
|
)
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
|
254
|
-
#### Using an array
|
255
|
-
```ruby
|
256
|
-
WordsCounted.count("1 2 3 4 5 6", regexp: /[0-9]/, exclude: ['1', '2', '3'])
|
257
|
-
counter.words
|
258
|
-
#=> ["4", "5", "6"]
|
259
|
-
```
|
260
|
-
|
261
|
-
#### Using a regular expression
|
262
|
-
```ruby
|
263
|
-
WordsCounted.count("Hello Beirut", exclude: /Beirut/)
|
264
|
-
counter.words
|
265
|
-
#=> ["Hello"]
|
266
|
-
```
|
267
|
-
|
268
|
-
#### Using a lambda
|
269
|
-
```ruby
|
270
|
-
WordsCounted.count("1 2 3 4 5 6", regexp: /[0-9]/, exclude: ->(w) { w.to_i.even? })
|
271
|
-
counter.words
|
272
|
-
#=> ["1", "3", "5"]
|
248
|
+
tokeniser.tokenise(
|
249
|
+
exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
|
250
|
+
)
|
251
|
+
# => ["هي", "سامي", "وداني"]
|
273
252
|
```
|
274
253
|
|
275
|
-
## Passing in a
|
254
|
+
## Passing in a custom regexp
|
276
255
|
|
277
|
-
|
256
|
+
The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means *twenty-one* is treated as one token. So is *Mohamad's*.
|
278
257
|
|
279
258
|
```ruby
|
280
259
|
/[\p{Alpha}\-']+/
|
281
260
|
```
|
282
261
|
|
283
|
-
|
262
|
+
You can pass your own criteria as a Ruby regular expression to split your string as desired.
|
284
263
|
|
285
|
-
For example, if you wanted to include numbers
|
264
|
+
For example, if you wanted to include numbers, you can override the regular expression:
|
286
265
|
|
287
266
|
```ruby
|
288
|
-
counter = WordsCounted.count("Numbers 1, 2, and 3",
|
289
|
-
counter.
|
290
|
-
#=> ["
|
267
|
+
counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
|
268
|
+
counter.tokens
|
269
|
+
#=> ["numbers", "1", "2", "and", "3"]
|
291
270
|
```
|
292
271
|
|
293
|
-
## Opening and
|
272
|
+
## Opening and reading files
|
294
273
|
|
295
|
-
Use the `from_file` method to open files. `from_file` accepts the same options as
|
274
|
+
Use the `from_file` method to open files. `from_file` accepts the same options as `.count`. The file path can be a URL.
|
296
275
|
|
297
276
|
```ruby
|
298
277
|
counter = WordsCounted.from_file("url/or/path/to/file.text")
|
@@ -300,41 +279,31 @@ counter = WordsCounted.from_file("url/or/path/to/file.text")
|
|
300
279
|
|
301
280
|
## Gotchas
|
302
281
|
|
303
|
-
A hyphen used in leu of an *em* or *en* dash will form part of the
|
282
|
+
A hyphen used in leu of an *em* or *en* dash will form part of the token. This affects the tokeniser algorithm.
|
304
283
|
|
305
284
|
```ruby
|
306
285
|
counter = WordsCounted.count("How do you do?-you are well, I see.")
|
307
|
-
counter.
|
308
|
-
|
309
|
-
{
|
310
|
-
"how" => 1,
|
311
|
-
"do" => 2,
|
312
|
-
"you" => 1,
|
313
|
-
"-you" => 1, # WTF, mate!
|
314
|
-
"are" => 1,
|
315
|
-
"very" => 1,
|
316
|
-
"well" => 1,
|
317
|
-
"i" => 1,
|
318
|
-
"see" => 1
|
319
|
-
}
|
320
|
-
```
|
286
|
+
counter.token_frequency
|
321
287
|
|
322
|
-
|
288
|
+
[
|
289
|
+
["do", 2],
|
290
|
+
["how", 1],
|
291
|
+
["you", 1],
|
292
|
+
["-you", 1], # WTF, mate!
|
293
|
+
["are", 1],
|
294
|
+
# ...
|
295
|
+
]
|
296
|
+
```
|
323
297
|
|
324
|
-
|
298
|
+
In this example `-you` and `you` are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.
|
325
299
|
|
326
300
|
### A note on case sensitivity
|
327
301
|
|
328
|
-
The program will downcase all incoming strings for consistency.
|
302
|
+
The program will normalise (downcase) all incoming strings for consistency and filters.
|
329
303
|
|
330
|
-
##
|
304
|
+
## Roadmap
|
331
305
|
|
332
|
-
|
333
|
-
2. Add paragraph, sentence, average words per sentence, and average sentence chars counters.
|
334
|
-
|
335
|
-
#### Ability to read URLs
|
336
|
-
|
337
|
-
Something like...
|
306
|
+
### Ability to open URLs
|
338
307
|
|
339
308
|
```ruby
|
340
309
|
def self.from_url
|
@@ -342,21 +311,9 @@ def self.from_url
|
|
342
311
|
end
|
343
312
|
```
|
344
313
|
|
345
|
-
## But wait... wait a minute...
|
346
|
-
|
347
|
-
#### Isn't it better to write this in JavaScript?
|
348
|
-
|
349
|
-
![Picard face-palm](http://stream1.gifsoup.com/view3/1290449/picard-facepalm-o.gif "Picard face-palm")
|
350
|
-
|
351
|
-
## About
|
352
|
-
|
353
|
-
Originally I wrote this program for a code challenge on Treehouse. You can find the original implementation on [Code Review][1].
|
354
|
-
|
355
314
|
## Contributors
|
356
315
|
|
357
|
-
|
358
|
-
|
359
|
-
Thanks to [Wayne Conrad][2] for providing [an excellent code review][3], and improving the filter feature to well beyond what I can come up with.
|
316
|
+
See [contributors][3]. Not listed there is [Dave Yarwood][1].
|
360
317
|
|
361
318
|
## Contributing
|
362
319
|
|
@@ -366,10 +323,10 @@ Thanks to [Wayne Conrad][2] for providing [an excellent code review][3], and imp
|
|
366
323
|
4. Push to the branch (`git push origin my-new-feature`)
|
367
324
|
5. Create new Pull Request
|
368
325
|
|
369
|
-
|
370
|
-
[
|
371
|
-
[2]: https://github.com/wconrad
|
372
|
-
[3]: http://codereview.stackexchange.com/a/49476/1563
|
326
|
+
[2]: http://www.rubydoc.info/gems/words_counted
|
327
|
+
[3]: https://github.com/abitdodgy/words_counted/graphs/contributors
|
373
328
|
[4]: http://rubywordcount.com
|
374
|
-
[5]: https://github.com/abitdodgy/words_counted#excluding-
|
329
|
+
[5]: https://github.com/abitdodgy/words_counted#excluding-tokens-from-the-analyser
|
375
330
|
[6]: https://github.com/abitdodgy/words_counted#passing-in-a-custom-regexp
|
331
|
+
[7]: http://www.rubydoc.info/gems/words_counted/
|
332
|
+
[8]: https://github.com/abitdodgy/words_counted/issues/new
|
@@ -0,0 +1,14 @@
|
|
1
|
+
# -*- encoding : utf-8 -*-
|
2
|
+
module Refinements
|
3
|
+
module HashRefinements
|
4
|
+
refine Hash do
|
5
|
+
# This is convenience method to sort hashes into an
|
6
|
+
# array of tuples by descending value.
|
7
|
+
#
|
8
|
+
# @return [Array<Array>] A sorted (unstable) array of candidates
|
9
|
+
def sort_by_value_desc
|
10
|
+
sort_by(&:last).reverse
|
11
|
+
end
|
12
|
+
end
|
13
|
+
end
|
14
|
+
end
|