words_counted 1.0.2 → 1.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 9596a5852bcc4c417a62f4ec4aa2153eed7d4297
4
- data.tar.gz: fad6c9456601aff8c709b1a385d9638e68140ac5
2
+ SHA256:
3
+ metadata.gz: a248654f9f76e28bde0f54993a5c5c87504acffed42b1531acc9de7f385f0696
4
+ data.tar.gz: c057a7ecb20d7989651b6667f39d16820734e63dd751a0182406f268ecf0f347
5
5
  SHA512:
6
- metadata.gz: f0891409f8b63f89527fd68e8fed3ef2f917b9b342a0aabd6a00479627bea4ef2e9552810a25a4fd103529bc8cda846b9b50624c91437c0cf6a94b7e4e316c6c
7
- data.tar.gz: 36c3c925b20388d86a5ef582e9ad09ca74b1f9fda8a9a66113ed5783d5123b2d99db342450a54e76278b41990c1301c6e31750ee7c5072aacf21e544147cd82c
6
+ metadata.gz: 2c4a5028624393434586c7570e8a6c98785c6cedfc3a6f5c07b7fa9b8aba2880ddf847be8779f623df8e36becb8e148aeaabfae822dcc4f0c9b1db414f8c7916
7
+ data.tar.gz: e115d757c34480e9e7425db94f6c78a035b4464c69946aa31cbb45ea28f963dc1088a1617269b506001669753b2725abf4f0b708303ced59aa5c59cb1658096c
data/.ruby-style.yml CHANGED
@@ -1,2 +1,2 @@
1
- Style/IfUnlessModifier:
2
- MaxLineLength: 120
1
+ Metrics/LineLength:
2
+ Max: 120
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 3.0.1
data/.travis.yml CHANGED
@@ -1,8 +1,8 @@
1
1
  language: ruby
2
2
 
3
3
  rvm:
4
- - 2.1
5
- - 2.2
4
+ - 3.0.0
5
+ - 3.0.1
6
6
  - ruby-head
7
7
 
8
8
  gemfile:
data/CHANGELOG.md CHANGED
@@ -1,3 +1,8 @@
1
+ ## Version 1.0.3
2
+
3
+ 1. Adds support for Ruby 3.0.0.
4
+ 2. Improve documentation and adds newer configs to Travis CI and Hound.
5
+
1
6
  ## Version 1.0
2
7
 
3
8
  This version brings lots of improvements to code organisation. The tokeniser has been extracted into its own class. All methods in `Counter` have either renamed or deprecated. Deprecated methods and their tests have moved into their own modules. Using them will trigger warnings with upgrade instructions outlined below.
data/README.md CHANGED
@@ -1,14 +1,22 @@
1
1
  # WordsCounted
2
2
 
3
- WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class. [Consult the documentation][2] for more information.
3
+ > We are all in the gutter, but some of us are looking at the stars.
4
+ >
5
+ > -- Oscar Wilde
6
+
7
+ WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.
8
+
9
+ **Are you using WordsCounted to do something interesting?** Please [tell me about it][8].
4
10
 
5
11
  <a href="http://badge.fury.io/rb/words_counted">
6
12
  <img src="https://badge.fury.io/rb/words_counted@2x.png" alt="Gem Version" height="18">
7
13
  </a>
8
14
 
15
+ [RubyDoc documentation][7].
16
+
9
17
  ### Demo
10
18
 
11
- Visit [this website][4] for an example of what the gem can do.
19
+ Visit [this website][4] for one example of what you can do with WordsCounted.
12
20
 
13
21
  ### Features
14
22
 
@@ -22,8 +30,6 @@ Visit [this website][4] for an example of what the gem can do.
22
30
  * Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): *Bayrūt* is treated as `["Bayrūt"]` and not `["Bayr", "ū", "t"]`, for example.
23
31
  * Opens and reads files. Pass in a file path or a url instead of a string.
24
32
 
25
- See usage instructions for more details.
26
-
27
33
  ## Installation
28
34
 
29
35
  Add this line to your application's Gemfile:
@@ -51,13 +57,15 @@ counter = WordsCounted.count(
51
57
  counter = WordsCounted.from_file("path/or/url/to/my/file.txt")
52
58
  ```
53
59
 
54
- `.count` and `.from_file` are convenience methods that take an input, tokenise it, and return an instance of `Counter` initialized with the tokens. The `Tokeniser` and `Counter` classes can be used alone, however.
60
+ `.count` and `.from_file` are convenience methods that take an input, tokenise it, and return an instance of `WordsCounted::Counter` initialized with the tokens. The `WordsCounted::Tokeniser` and `WordsCounted::Counter` classes can be used alone, however.
55
61
 
56
62
  ## API
57
63
 
64
+ ### WordsCounted
65
+
58
66
  **`WordsCounted.count(input, options = {})`**
59
67
 
60
- Tokenises input and initializes a `Counter` object with the resulting tokens.
68
+ Tokenises input and initializes a `WordsCounted::Counter` object with the resulting tokens.
61
69
 
62
70
  ```ruby
63
71
  counter = WordsCounted.count("Hello Beirut!")
@@ -67,10 +75,10 @@ Accepts two options: `exclude` and `regexp`. See [Excluding tokens from the anal
67
75
 
68
76
  **`WordsCounted.from_file(path, options = {})`**
69
77
 
70
- Reads and tokenises a file, and initializes a `Counter` object with the resulting tokens.
78
+ Reads and tokenises a file, and initializes a `WordsCounted::Counter` object with the resulting tokens.
71
79
 
72
80
  ```ruby
73
- counter = WordsCounted.count("hello_beirut.txt")
81
+ counter = WordsCounted.from_file("hello_beirut.txt")
74
82
  ````
75
83
 
76
84
  Accepts the same options as `.count`.
@@ -84,20 +92,20 @@ Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and to
84
92
  **`#tokenise([pattern: TOKEN_REGEXP, exclude: nil])`**
85
93
 
86
94
  ```ruby
87
- tokeniser = Tokeniser.new("Hello Beirut!").tokenise
95
+ tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise
88
96
 
89
97
  # With `exclude`
90
- tokeniser = Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")
98
+ tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")
91
99
 
92
100
  # With `pattern`
93
- tokeniser = Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)
101
+ tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)
94
102
  ```
95
103
 
96
104
  See [Excluding tokens from the analyser][5] and [Passing in a custom regexp][6] for more information.
97
105
 
98
106
  ### Counter
99
107
 
100
- The `Counter` class allows you to collect various statistics from an array of tokens.
108
+ The `WordsCounted::Counter` class allows you to collect various statistics from an array of tokens.
101
109
 
102
110
  **`#token_count`**
103
111
 
@@ -111,7 +119,7 @@ counter.token_count #=> 15
111
119
 
112
120
  Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.
113
121
 
114
- ```
122
+ ```ruby
115
123
  counter.token_frequency
116
124
 
117
125
  [
@@ -192,12 +200,12 @@ Returns the average char count per token rounded to two decimal places. Accepts
192
200
  counter.average_chars_per_token #=> 4
193
201
  ```
194
202
 
195
- **`#unique_token_count`**
203
+ **`#uniq_token_count`**
196
204
 
197
- Returns the number unique tokens.
205
+ Returns the number of unique tokens.
198
206
 
199
207
  ```ruby
200
- counter.unique_token_count #=> 13
208
+ counter.uniq_token_count #=> 13
201
209
  ```
202
210
 
203
211
  ## Excluding tokens from the tokeniser
@@ -207,33 +215,30 @@ You can exclude anything you want from the input by passing the `exclude` option
207
215
  1. A *space-delimited* string. The filter will normalise the string.
208
216
  2. A regular expression.
209
217
  3. A lambda.
210
- 4. A symbol that is convertible to a proc. For example `:odd?`.
218
+ 4. A symbol that names a predicate method. For example `:odd?`.
211
219
  5. An array of any combination of the above.
212
220
 
213
221
  ```ruby
214
222
  tokeniser =
215
223
  WordsCounted::Tokeniser.new(
216
- "Magnificent! That was magnificent, Trevor.", exclude: "was magnificent"
224
+ "Magnificent! That was magnificent, Trevor."
217
225
  )
218
226
 
219
227
  # Using a string
220
228
  tokeniser.tokenise(exclude: "was magnificent")
221
- tokeniser.tokens
222
229
  # => ["that", "trevor"]
223
230
 
224
231
  # Using a regular expression
225
- tokeniser.tokenise(exclude: /Trevor/)
226
- counter.tokens
227
- # => ["that", "was", "magnificent"]
232
+ tokeniser.tokenise(exclude: /trevor/)
233
+ # => ["magnificent", "that", "was", "magnificent"]
228
234
 
229
235
  # Using a lambda
230
236
  tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
231
- counter.tokens
232
- # => ["magnificent", "trevor"]
237
+ # => ["magnificent", "that", "magnificent", "trevor"]
233
238
 
234
239
  # Using symbol
235
240
  tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
236
- t.tokenise(exclude: :ascii_only?)
241
+ tokeniser.tokenise(exclude: :ascii_only?)
237
242
  # => ["محمد"]
238
243
 
239
244
  # Using an array
@@ -243,10 +248,10 @@ tokeniser = WordsCounted::Tokeniser.new(
243
248
  tokeniser.tokenise(
244
249
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
245
250
  )
246
- # => ["هي", "سامي", "ودان"]
251
+ # => ["هي", "سامي", "وداني"]
247
252
  ```
248
253
 
249
- ## Passing in a Custom Regexp
254
+ ## Passing in a custom regexp
250
255
 
251
256
  The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means *twenty-one* is treated as one token. So is *Mohamad's*.
252
257
 
@@ -259,12 +264,12 @@ You can pass your own criteria as a Ruby regular expression to split your string
259
264
  For example, if you wanted to include numbers, you can override the regular expression:
260
265
 
261
266
  ```ruby
262
- counter = WordsCounted.count("Numbers 1, 2, and 3", regexp: /[\p{Alnum}\-']+/)
267
+ counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
263
268
  counter.tokens
264
- #=> ["Numbers", "1", "2", "and", "3"]
269
+ #=> ["numbers", "1", "2", "and", "3"]
265
270
  ```
266
271
 
267
- ## Opening and Reading Files
272
+ ## Opening and reading files
268
273
 
269
274
  Use the `from_file` method to open files. `from_file` accepts the same options as `.count`. The file path can be a URL.
270
275
 
@@ -296,14 +301,9 @@ In this example `-you` and `you` are separate tokens. Also, the tokeniser does n
296
301
 
297
302
  The program will normalise (downcase) all incoming strings for consistency and filters.
298
303
 
299
- ## Road Map
300
-
301
- 1. Add ability to open URLs.
302
- 2. Add Ngram support.
303
-
304
- #### Ability to read URLs
304
+ ## Roadmap
305
305
 
306
- Something like...
306
+ ### Ability to open URLs
307
307
 
308
308
  ```ruby
309
309
  def self.from_url
@@ -311,10 +311,6 @@ def self.from_url
311
311
  end
312
312
  ```
313
313
 
314
- ## About
315
-
316
- Originally I wrote this program for a code challenge on Treehouse. You can find the original implementation on [Code Review][1].
317
-
318
314
  ## Contributors
319
315
 
320
316
  See [contributors][3]. Not listed there is [Dave Yarwood][1].
@@ -327,10 +323,10 @@ See [contributors][3]. Not listed there is [Dave Yarwood][1].
327
323
  4. Push to the branch (`git push origin my-new-feature`)
328
324
  5. Create new Pull Request
329
325
 
330
-
331
- [1]: http://codereview.stackexchange.com/questions/46105/a-ruby-string-analyser
332
326
  [2]: http://www.rubydoc.info/gems/words_counted
333
327
  [3]: https://github.com/abitdodgy/words_counted/graphs/contributors
334
328
  [4]: http://rubywordcount.com
335
329
  [5]: https://github.com/abitdodgy/words_counted#excluding-tokens-from-the-analyser
336
330
  [6]: https://github.com/abitdodgy/words_counted#passing-in-a-custom-regexp
331
+ [7]: http://www.rubydoc.info/gems/words_counted/
332
+ [8]: https://github.com/abitdodgy/words_counted/issues/new
@@ -4,6 +4,8 @@ module Refinements
4
4
  refine Hash do
5
5
  # This is convenience method to sort hashes into an
6
6
  # array of tuples by descending value.
7
+ #
8
+ # @return [Array<Array>] A sorted (unstable) array of candidates
7
9
  def sort_by_value_desc
8
10
  sort_by(&:last).reverse
9
11
  end
@@ -3,10 +3,21 @@ module WordsCounted
3
3
  using Refinements::HashRefinements
4
4
 
5
5
  class Counter
6
+ # This module contains several methods to extract useful statistics
7
+ # from any array of tokens, such as density, frequency, and more.
8
+ #
9
+ # @example
10
+ # WordsCounted::Counter.new(["hello", "world"]).token_count
11
+ # # => 2
12
+
6
13
  include Deprecated
7
14
 
15
+ # @return [Array<String>] an array of tokens.
8
16
  attr_reader :tokens
9
17
 
18
+ # Initializes state with an array of tokens.
19
+ #
20
+ # @param [Array] An array of tokens to perform operations on
10
21
  def initialize(tokens)
11
22
  @tokens = tokens
12
23
  end
@@ -17,7 +28,7 @@ module WordsCounted
17
28
  # Counter.new(%w[one two two three three three]).token_count
18
29
  # # => 6
19
30
  #
20
- # @return [Integer] The number of tokens.
31
+ # @return [Integer] The number of tokens
21
32
  def token_count
22
33
  tokens.size
23
34
  end
@@ -28,7 +39,7 @@ module WordsCounted
28
39
  # Counter.new(%w[one two two three three three]).uniq_token_count
29
40
  # # => 3
30
41
  #
31
- # @return [Integer] The number of unique tokens.
42
+ # @return [Integer] The number of unique tokens
32
43
  def uniq_token_count
33
44
  tokens.uniq.size
34
45
  end
@@ -39,7 +50,7 @@ module WordsCounted
39
50
  # Counter.new(%w[one two]).char_count
40
51
  # # => 6
41
52
  #
42
- # @return [Integer] The total char count of tokens.
53
+ # @return [Integer] The total char count of tokens
43
54
  def char_count
44
55
  tokens.join.size
45
56
  end
@@ -51,7 +62,7 @@ module WordsCounted
51
62
  # Counter.new(%w[one two two three three three]).token_frequency
52
63
  # # => [ ['three', 3], ['two', 2], ['one', 1] ]
53
64
  #
54
- # @return [Array<Array<String, Integer>>]
65
+ # @return [Array<Array<String, Integer>>] An array of tokens and their frequencies
55
66
  def token_frequency
56
67
  tokens.each_with_object(Hash.new(0)) { |token, hash| hash[token] += 1 }.sort_by_value_desc
57
68
  end
@@ -63,7 +74,7 @@ module WordsCounted
63
74
  # Counter.new(%w[one two three four five]).token_lenghts
64
75
  # # => [ ['three', 5], ['four', 4], ['five', 4], ['one', 3], ['two', 3] ]
65
76
  #
66
- # @return [Array<Array<String, Integer>>]
77
+ # @return [Array<Array<String, Integer>>] An array of tokens and their lengths
67
78
  def token_lengths
68
79
  tokens.uniq.each_with_object({}) { |token, hash| hash[token] = token.length }.sort_by_value_desc
69
80
  end
@@ -80,8 +91,8 @@ module WordsCounted
80
91
  # Counter.new(%w[Maj. Major Major Major]).token_density(precision: 4)
81
92
  # # => [ ['major', .7500], ['maj', .2500] ]
82
93
  #
83
- # @param [Integer] precision The number of decimal places to round density to.
84
- # @return [Array<Array<String, Float>>]
94
+ # @param [Integer] precision The number of decimal places to round density to
95
+ # @return [Array<Array<String, Float>>] An array of tokens and their densities
85
96
  def token_density(precision: 2)
86
97
  token_frequency.each_with_object({}) { |(token, freq), hash|
87
98
  hash[token] = (freq / token_count.to_f).round(precision)
@@ -94,18 +105,18 @@ module WordsCounted
94
105
  # Counter.new(%w[one once two two twice twice]).most_frequent_tokens
95
106
  # # => { 'two' => 2, 'twice' => 2 }
96
107
  #
97
- # @return [Hash<String, Integer>]
108
+ # @return [Hash{String => Integer}] A hash of tokens and their frequencies
98
109
  def most_frequent_tokens
99
110
  token_frequency.group_by(&:last).max.last.to_h
100
111
  end
101
112
 
102
- # Returns a hash of tokens and their lengths for tokens with the highest length.
113
+ # Returns a hash of tokens and their lengths for tokens with the highest length
103
114
  #
104
115
  # @example
105
116
  # Counter.new(%w[one three five seven]).longest_tokens
106
117
  # # => { 'three' => 5, 'seven' => 5 }
107
118
  #
108
- # @return [Hash<String, Integer>]
119
+ # @return [Hash{String => Integer}] A hash of tokens and their lengths
109
120
  def longest_tokens
110
121
  token_lengths.group_by(&:last).max.last.to_h
111
122
  end
@@ -117,7 +128,8 @@ module WordsCounted
117
128
  # Counter.new(%w[one three five seven]).average_chars_per_token
118
129
  # # => 4.25
119
130
  #
120
- # @return [Float] The average char count per token.
131
+ # @param [Integer] precision The number of decimal places to round average char count to
132
+ # @return [Float] The average char count per token
121
133
  def average_chars_per_token(precision: 2)
122
134
  (char_count / token_count.to_f).round(precision)
123
135
  end
@@ -1,6 +1,8 @@
1
1
  # -*- encoding : utf-8 -*-
2
2
  module WordsCounted
3
3
  module Deprecated
4
+ # The following methods are deprecated and will be removed in version 1.1.0.
5
+
4
6
  # @deprecated use `Counter#token_count`
5
7
  def word_count
6
8
  warn "`Counter#word_count` is deprecated, please use `Counter#token_count`"
@@ -5,28 +5,32 @@ module WordsCounted
5
5
  # Using `pattern` and `exclude` allows for powerful tokenisation strategies.
6
6
  #
7
7
  # @example
8
- # tokeniser = WordsCounted::Tokeniser.new("We are all in the gutter, but some of us are looking at the stars.")
8
+ # tokeniser
9
+ # = WordsCounted::Tokeniser.new(
10
+ # "We are all in the gutter, but some of us are looking at the stars."
11
+ # )
9
12
  # tokeniser.tokenise(exclude: "We are all in the gutter")
10
13
  # # => ['but', 'some', 'of', 'us', 'are', 'looking', 'at', 'the', 'stars']
11
14
 
12
15
  # Default tokenisation strategy
13
16
  TOKEN_REGEXP = /[\p{Alpha}\-']+/
14
17
 
15
- # Initialises state with a string that will be tokenised.
18
+ # Initialises state with the string to be tokenised.
16
19
  #
17
- # @param [String] input The string to tokenise.
18
- # @return [Tokeniser]
20
+ # @param [String] input The string to tokenise
19
21
  def initialize(input)
20
22
  @input = input
21
23
  end
22
24
 
23
25
  # Converts a string into an array of tokens using a regular expression.
24
- # If a regexp is not provided a default one is used. See {Tokenizer.TOKEN_REGEXP}.
26
+ # If a regexp is not provided a default one is used. See `Tokenizer.TOKEN_REGEXP`.
25
27
  #
26
28
  # Use `exclude` to remove tokens from the final list. `exclude` can be a string,
27
29
  # a regular expression, a lambda, a symbol, or an array of one or more of those types.
28
30
  # This allows for powerful and flexible tokenisation strategies.
29
31
  #
32
+ # If a symbol is passed, it must name a predicate method.
33
+ #
30
34
  # @example
31
35
  # WordsCounted::Tokeniser.new("Hello World").tokenise
32
36
  # # => ['hello', 'world']
@@ -44,7 +48,9 @@ module WordsCounted
44
48
  # # => ['dani']
45
49
  #
46
50
  # @example With `exclude` as a lambda
47
- # WordsCounted::Tokeniser.new("Goodbye Sami").tokenise(exclude: ->(token) { token.length > 6 })
51
+ # WordsCounted::Tokeniser.new("Goodbye Sami").tokenise(
52
+ # exclude: ->(token) { token.length > 6 }
53
+ # )
48
54
  # # => ['sami']
49
55
  #
50
56
  # @example With `exclude` as a symbol
@@ -52,26 +58,42 @@ module WordsCounted
52
58
  # # => ['محمد']
53
59
  #
54
60
  # @example With `exclude` as an array of strings
55
- # WordsCounted::Tokeniser.new("Goodbye Sami and hello Dani").tokenise(exclude: ["goodbye hello"])
61
+ # WordsCounted::Tokeniser.new("Goodbye Sami and hello Dani").tokenise(
62
+ # exclude: ["goodbye hello"]
63
+ # )
56
64
  # # => ['sami', 'and', dani']
57
65
  #
58
66
  # @example With `exclude` as an array of regular expressions
59
- # WordsCounted::Tokeniser.new("Goodbye and hello Dani").tokenise(exclude: [/goodbye/i, /and/i])
67
+ # WordsCounted::Tokeniser.new("Goodbye and hello Dani").tokenise(
68
+ # exclude: [/goodbye/i, /and/i]
69
+ # )
60
70
  # # => ['hello', 'dani']
61
71
  #
62
72
  # @example With `exclude` as an array of lambdas
63
73
  # t = WordsCounted::Tokeniser.new("Special Agent 007")
64
- # t.tokenise(exclude: [->(t) { t.to_i.odd? }, ->(t) { t.length > 5}])
74
+ # t.tokenise(
75
+ # exclude: [
76
+ # ->(t) { t.to_i.odd? },
77
+ # ->(t) { t.length > 5}
78
+ # ]
79
+ # )
65
80
  # # => ['agent']
66
81
  #
67
82
  # @example With `exclude` as a mixed array
68
83
  # t = WordsCounted::Tokeniser.new("Hello! اسماءنا هي محمد، كارولينا، سامي، وداني")
69
- # t.tokenise(exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"])
70
- # # => ["هي", "سامي", "ودان"]
71
- #
72
- # @param [Regexp] pattern The string to tokenise.
73
- # @param [Array<String, Regexp, Lambda, Symbol>, String, Regexp, Lambda, Symbol, nil] exclude The filter to apply.
74
- # @return [Array] the array of filtered tokens.
84
+ # t.tokenise(
85
+ # exclude: [
86
+ # :ascii_only?,
87
+ # /محمد/,
88
+ # ->(t) { t.length > 6},
89
+ # "و"
90
+ # ]
91
+ # )
92
+ # # => ["هي", "سامي", "وداني"]
93
+ #
94
+ # @param [Regexp] pattern The string to tokenise
95
+ # @param [Array<String, Regexp, Lambda, Symbol>, String, Regexp, Lambda, Symbol, nil] exclude The filter to apply
96
+ # @return [Array] The array of filtered tokens
75
97
  def tokenise(pattern: TOKEN_REGEXP, exclude: nil)
76
98
  filter_proc = filter_to_proc(exclude)
77
99
  @input.scan(pattern).map(&:downcase).reject { |token| filter_proc.call(token) }
@@ -79,22 +101,31 @@ module WordsCounted
79
101
 
80
102
  private
81
103
 
82
- # This method converts any arguments into a callable object. The return value of this
83
- # is then used to determine whether a token should be excluded from the final list or not.
104
+ # The following methods convert any arguments into a callable object. The return value of this
105
+ # lambda is then used to determine whether a token should be excluded from the final list.
84
106
  #
85
107
  # `filter` can be a string, a regular expression, a lambda, a symbol, or an array
86
108
  # of any combination of those types.
87
109
  #
88
- # If `filter` is a string, see {Tokeniser#filter_proc_from_string}.
89
- # If `filter` is a an array, see {Tokeniser#filter_procs_from_array}.
110
+ # If `filter` is a string, it converts the string into an array, and returns a lambda
111
+ # that returns true if the token is included in the resulting array.
112
+ #
113
+ # @see {Tokeniser#filter_proc_from_string}.
114
+ #
115
+ # If `filter` is a an array, it creates a new array where each element of the origingal is
116
+ # converted to a lambda, and returns a lambda that calls each lambda in the resulting array.
117
+ # If any lambda returns true the token is excluded from the final list.
118
+ #
119
+ # @see {Tokeniser#filter_procs_from_array}.
90
120
  #
91
121
  # If `filter` is a proc, then the proc is simply called. If `filter` is a regexp, a `lambda`
92
- # is returned that checks the token for a match. If a symbol is passed, it is converted to
93
- # a proc.
122
+ # is returned that checks the token for a match.
123
+ #
124
+ # If a symbol is passed, it is converted to a proc. The symbol must name a predicate method.
94
125
  #
95
126
  # This method depends on `nil` responding `to_a` with an empty array, which
96
127
  # avoids having to check if `exclude` was passed.
97
- #
128
+
98
129
  # @api private
99
130
  def filter_to_proc(filter)
100
131
  if filter.respond_to?(:to_a)
@@ -113,10 +144,6 @@ module WordsCounted
113
144
  end
114
145
  end
115
146
 
116
- # Converts an array of `filters` to an array of lambdas, and returns a lambda that calls
117
- # each lambda in the resulting array. If any lambda returns true the token is excluded
118
- # from the final list.
119
- #
120
147
  # @api private
121
148
  def filter_procs_from_array(filter)
122
149
  filter_procs = Array(filter).map &method(:filter_to_proc)
@@ -125,9 +152,6 @@ module WordsCounted
125
152
  }
126
153
  end
127
154
 
128
- # Converts a string `filter` to an array, and returns a lambda
129
- # that returns true if the token is included in the array.
130
- #
131
155
  # @api private
132
156
  def filter_proc_from_string(filter)
133
157
  normalized_exclusion_list = filter.split.map(&:downcase)
@@ -1,4 +1,4 @@
1
1
  # -*- encoding : utf-8 -*-
2
2
  module WordsCounted
3
- VERSION = "1.0.2"
3
+ VERSION = "1.0.3"
4
4
  end
data/lib/words_counted.rb CHANGED
@@ -19,10 +19,11 @@ module WordsCounted
19
19
  # @see Tokeniser.tokenise
20
20
  # @see Counter.initialize
21
21
  #
22
- # @param [String] input The input to be tokenised.
23
- # @param [Hash] options The options to pass onto `Counter`.
22
+ # @param [String] input The input to be tokenised
23
+ # @param [Hash] options The options to pass onto `Counter`
24
+ # @return [WordsCounted::Counter] An instance of Counter
24
25
  def self.count(input, options = {})
25
- tokens = Tokeniser.new(input).tokenise(options)
26
+ tokens = Tokeniser.new(input).tokenise(**options)
26
27
  Counter.new(tokens)
27
28
  end
28
29
 
@@ -32,11 +33,12 @@ module WordsCounted
32
33
  # @see Tokeniser.tokenise
33
34
  # @see Counter.initialize
34
35
  #
35
- # @param [String] path The file to be read and tokenised.
36
- # @param [Hash] options The options to pass onto `Counter`.
36
+ # @param [String] path The file to be read and tokenised
37
+ # @param [Hash] options The options to pass onto `Counter`
38
+ # @return [WordsCounted::Counter] An instance of Counter
37
39
  def self.from_file(path, options = {})
38
40
  tokens = File.open(path) do |file|
39
- Tokeniser.new(file.read).tokenise(options)
41
+ Tokeniser.new(file.read).tokenise(**options)
40
42
  end
41
43
  Counter.new(tokens)
42
44
  end
@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
19
19
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
20
20
  spec.require_paths = ["lib"]
21
21
 
22
- spec.add_development_dependency "bundler", "~> 1.3"
22
+ spec.add_development_dependency "bundler"
23
23
  spec.add_development_dependency "rake"
24
24
  spec.add_development_dependency "rspec"
25
25
  spec.add_development_dependency "pry"
metadata CHANGED
@@ -1,29 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: words_counted
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.2
4
+ version: 1.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mohamad El-Husseini
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-10-25 00:00:00.000000000 Z
11
+ date: 2021-10-14 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: '1.3'
19
+ version: '0'
20
20
  type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: '1.3'
26
+ version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rake
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -78,6 +78,7 @@ files:
78
78
  - ".hound.yml"
79
79
  - ".rspec"
80
80
  - ".ruby-style.yml"
81
+ - ".ruby-version"
81
82
  - ".travis.yml"
82
83
  - ".yardopts"
83
84
  - CHANGELOG.md
@@ -102,7 +103,7 @@ homepage: https://github.com/abitdodgy/words_counted
102
103
  licenses:
103
104
  - MIT
104
105
  metadata: {}
105
- post_install_message:
106
+ post_install_message:
106
107
  rdoc_options: []
107
108
  require_paths:
108
109
  - lib
@@ -117,9 +118,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
117
118
  - !ruby/object:Gem::Version
118
119
  version: '0'
119
120
  requirements: []
120
- rubyforge_project:
121
- rubygems_version: 2.4.5
122
- signing_key:
121
+ rubygems_version: 3.2.15
122
+ signing_key:
123
123
  specification_version: 4
124
124
  summary: See README.
125
125
  test_files: