nlp-pure 0.0.5 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 8ae3951baabcafe913e157a575e3dc718a646f16
4
- data.tar.gz: 14a6567449629a482bdc8863ffbfd04ae72af61b
3
+ metadata.gz: c5bbc92e65c96837a6e53f28248e15d48a35abe1
4
+ data.tar.gz: 79f767942ba8723a3f5f6eb04ea0ec4498e02591
5
5
  SHA512:
6
- metadata.gz: f1766d42dd2916bdb0491448a9db0122b86f31325e3f12ce94c0d6b403cf5ecf50e4e95139f018f76896c1ef432e71a11dc36d8c2c597dc0870f400fb56bfeae
7
- data.tar.gz: b3baa2f16339813070ffa978e03e8972046baebba815259d695da517baedc2830a55d0cfa75dcd234e307d2f09684194c02aa83a3b919f3db08be1426eb71537
6
+ metadata.gz: 9e00458afc1dadd851ea8ccd4e312ec19c6b775b455ebf6d5e599480dde8f333704a8c9601f62970bdefe26ffea7bd509bb3cd52314775d642865587d94e7214
7
+ data.tar.gz: 72abbd773eb915a11f76526b9bdfb37cbcd05c258aab45fd3c7e18c9fc1591c84c97cc3f99641ecee20ad27ea47d10daf9e35128d572d95aeb17aeda809e8a93
@@ -1,14 +1,14 @@
1
1
  language: ruby
2
2
  sudo: false
3
3
  cache: bundler
4
+ # NOTE: these run in order
4
5
  rvm:
5
- - 2.2
6
- - 2.1
7
- - 2.0.0
8
6
  - jruby
9
7
  - rbx-2
8
+ - 2.0.0
9
+ - 2.1
10
+ - 2.2
10
11
  matrix:
11
12
  allow_failures:
12
13
  - rvm: rbx-2
13
14
  - rvm: jruby
14
- bundler_args: --without development
@@ -1,3 +1,9 @@
1
+ # 0.1.0
2
+
3
+ Officially leaving a non-semantic versioning scheme.
4
+
5
+ Added benchmarking test.
6
+
1
7
  # 0.0.5
2
8
 
3
9
  Fixed bug in `NlpPure::Segmenting::DefaultWord` where leading ellipses could produce extra segmented words.
@@ -1,3 +1,5 @@
1
+ # Contributing
2
+
1
3
  Pull requests are welcomed! Here’s a quick guide:
2
4
 
3
5
  1. Fork the repo.
@@ -13,11 +15,23 @@ a test!
13
15
 
14
16
  5. Push to your fork and submit a pull request.
15
17
 
16
- Syntax:
18
+
19
+ ## Project Goals
20
+
21
+ * Accuracy over speed
22
+ * One installation step (through `gem` or `bundle`)
23
+ * Minimal runtime dependencies (beyond the standard libraries)
24
+ * Effective collaboration (and minimized interpersonal conflict)
25
+ * Sustainability and maintainability (this isn’t a full-time project)
26
+
27
+
28
+ ## Style Guide
29
+
30
+ See also: `rake rubocop`
17
31
 
18
32
  * Two spaces, no tabs.
19
33
  * No trailing whitespace. Blank lines should not have any space.
20
- * Prefer &&/|| over and/or.
21
- * MyClass.my_method(my_arg) not my_method( my_arg ) or my_method my_arg.
22
- * a = b and not a=b.
34
+ * Prefer `&& ||` over `and or`.
35
+ * Use `MyClass.my_method(my_arg)` not `my_method( my_arg )` or `my_method my_arg`.
36
+ * Prefer `a = b` to `a=b`.
23
37
  * Follow the conventions you see used in the source already.
data/README.md CHANGED
@@ -10,11 +10,16 @@ NOTE: this is not affiliated with, endorsed by, or in any way connected with [Pu
10
10
 
11
11
  This project aims to provide functionality similar to [Treat](https://github.com/louismullie/treat), [open-nlp](https://github.com/louismullie/open-nlp), and [stanford-core-nlp](https://rubygems.org/gems/stanford-core-nlp) but with fewer dependencies. The code is tested against English language but the algorithm implementations aim to be flexible for other languages.
12
12
 
13
+ ## Table of Contents
13
14
 
14
- ## Requirements
15
-
16
- TODO
17
-
15
+ * [Installation](#installation)
16
+ * [Usage](#usage)
17
+ ** [Word Segmentation](#word-segmentation)
18
+ * [Supported Ruby Versions](#supported-ruby-versions)
19
+ * [Versioning](#versioning)
20
+ * [Contributing](CONTRIBUTING.md)
21
+ * [License](LICENSE)
22
+ * [See Also](#see-also)
18
23
 
19
24
  ## Installation
20
25
 
@@ -89,3 +94,36 @@ Constraint](http://docs.rubygems.org/read/chapter/16#page74) with two digits of
89
94
  ```ruby
90
95
  spec.add_dependency 'nlp-pure', '~> 0.1'
91
96
  ```
97
+
98
+
99
+ ## See Also
100
+
101
+ [Search “nlp” at ruby-toolbox.com](https://www.ruby-toolbox.com/search?q=nlp)
102
+
103
+ * APIs
104
+ ** [alchemy_api](https://github.com/dbalatero/alchemy_api)
105
+ ** [napi-ruby](https://github.com/Maluuba/napi-ruby)
106
+ ** [poliqarpr](https://github.com/apohllo/poliqarpr)
107
+ ** [wlapi](https://github.com/arbox/wlapi)
108
+ * Bindings and Toolkits
109
+ ** [open-nlp](https://github.com/louismullie/open-nlp)
110
+ ** [stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp)
111
+ ** [treat](https://github.com/louismullie/treat)
112
+ * Classification
113
+ ** [linnaeus](https://github.com/djcp/linnaeus)
114
+ ** [maxent_string_classifier](https://github.com/mccraigmccraig/maxent_string_classifier)
115
+ * N-Grams
116
+ ** [ruby-ngram](https://github.com/tkellen/ruby-ngram)
117
+ * Specific Languages
118
+ ** Polish
119
+ *** [nlp](https://github.com/knife/nlp)
120
+ * Stopwords
121
+ ** [clarifier](https://github.com/meducation/clarifier)
122
+ ** [stopwords](https://github.com/brez/stopwords)
123
+ ** [stopwords-filter](https://github.com/brenes/stopwords-filter)
124
+ * Tokenization
125
+ ** [rseg](https://rubygems.org/gems/rseg)
126
+ ** [Tokenizer](https://github.com/arbox/tokenizer)
127
+ * Word Counters
128
+ ** [words_counted](https://github.com/abitdodgy/words_counted)
129
+
@@ -16,13 +16,15 @@ module NlpPure
16
16
  ]
17
17
  }.freeze
18
18
 
19
- def self.parse(*args)
19
+ module_function
20
+
21
+ def parse(*args)
20
22
  unless args.nil? || args.empty?
21
- clean_input(args[0]).split(options[:split])
23
+ clean_input(args[0]).split(options.fetch(:split, nil))
22
24
  end
23
25
  end
24
26
 
25
- def self.clean_input(text = nil)
27
+ def clean_input(text = nil)
26
28
  input = text.to_s
27
29
  # perform replacements to work around the limitations of the splitting regexp
28
30
  options.fetch(:gsub, []).each do |gsub_pair|
@@ -33,7 +35,7 @@ module NlpPure
33
35
  end
34
36
 
35
37
  # NOTE: exposed as a method for easy mock/stub
36
- def self.options
38
+ def options
37
39
  DEFAULT_OPTIONS
38
40
  end
39
41
  end
@@ -1,5 +1,5 @@
1
1
  # encoding: utf-8
2
2
  #
3
3
  module NlpPure
4
- VERSION = '0.0.5'
4
+ VERSION = '0.1.0'
5
5
  end
@@ -7,6 +7,12 @@ describe NlpPure::Segmenting::DefaultWord do
7
7
  it 'is defined' do
8
8
  expect(defined?(NlpPure::Segmenting::DefaultWord)).to be_truthy
9
9
  end
10
+
11
+ describe '::DEFAULT_OPTIONS' do
12
+ it 'is Hash' do
13
+ expect(NlpPure::Segmenting::DefaultWord::DEFAULT_OPTIONS).to be_a Hash
14
+ end
15
+ end
10
16
  end
11
17
 
12
18
  describe '.parse' do
@@ -27,6 +33,26 @@ describe NlpPure::Segmenting::DefaultWord do
27
33
  let(:english_simple_paragraph) { 'Mary had a little lamb. The lamb’s fleece was white as snow. Everywhere that Mary went, the lamb was sure to go.' }
28
34
  let(:english_simple_line_breaks) { "Mary had a little lamb,\nHis fleece was white as snow,\nAnd everywhere that Mary went,\nThe lamb was sure to go." }
29
35
 
36
+ context '(with nil options)' do
37
+ before do
38
+ expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return(nil)
39
+ end
40
+
41
+ it 'raises NoMethodError' do
42
+ expect { NlpPure::Segmenting::DefaultWord.parse(english_simple_sentence) }.to raise_error
43
+ end
44
+ end
45
+
46
+ context '(with blank options)' do
47
+ before do
48
+ expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return({})
49
+ end
50
+
51
+ it 'returns Array' do
52
+ expect(NlpPure::Segmenting::DefaultWord.parse(english_simple_sentence)).to be_an Array
53
+ end
54
+ end
55
+
30
56
  context '(with default options)' do
31
57
  context 'with `nil` argument' do
32
58
  it 'does not raise error' do
@@ -107,6 +133,74 @@ describe NlpPure::Segmenting::DefaultWord do
107
133
  it 'correctly counts with line breaks' do
108
134
  expect(NlpPure::Segmenting::DefaultWord.parse(english_simple_line_breaks).length).to eq(22)
109
135
  end
136
+
137
+ context 'benchmarking' do
138
+ before do
139
+ require 'benchmark'
140
+ end
141
+
142
+ it 'takes time', benchmarking: true do
143
+ expect(
144
+ Benchmark.realtime do
145
+ 1000.times do
146
+ NlpPure::Segmenting::DefaultWord.parse(english_simple_line_breaks)
147
+ end
148
+ end
149
+ ).to be < 0.1
150
+ end
151
+ end
152
+ end
153
+ end
154
+ end
155
+
156
+ describe '.clean_input' do
157
+ context 'English' do
158
+ let(:english_leading_ellipsis_sentence) { ' … the quick brown fox jumps over the lazy dog.' }
159
+
160
+ context '(with nil options)' do
161
+ before do
162
+ expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return(nil)
163
+ end
164
+
165
+ it 'raises NoMethodError' do
166
+ expect { NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence) }.to raise_error
167
+ end
168
+ end
169
+
170
+ context '(with blank options)' do
171
+ before do
172
+ expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return({})
173
+ end
174
+
175
+ it 'only strips whitespace' do
176
+ expect(NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence)).to eq english_leading_ellipsis_sentence.strip
177
+ end
178
+ end
179
+
180
+ context '(with default options)' do
181
+ context 'with `nil` argument' do
182
+ it 'does not raise error' do
183
+ expect { NlpPure::Segmenting::DefaultWord.clean_input(nil) }.to_not raise_error
184
+ end
185
+
186
+ it 'returns empty String' do
187
+ expect(NlpPure::Segmenting::DefaultWord.clean_input(nil)).to eq ''
188
+ end
189
+ end
190
+
191
+ context 'without arguments' do
192
+ it 'does not raise error' do
193
+ expect { NlpPure::Segmenting::DefaultWord.clean_input }.to_not raise_error
194
+ end
195
+
196
+ it 'returns nil' do
197
+ expect(NlpPure::Segmenting::DefaultWord.clean_input).to eq ''
198
+ end
199
+ end
200
+
201
+ it 'modifies the input' do
202
+ expect(NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence)).to_not eq english_leading_ellipsis_sentence
203
+ end
110
204
  end
111
205
  end
112
206
  end
@@ -2,7 +2,12 @@
2
2
  require 'rspec'
3
3
  require 'coveralls'
4
4
 
5
- Coveralls.wear!
5
+ Coveralls.wear! do
6
+ add_filter '/vendor/'
7
+ add_filter '/test/'
8
+ add_filter '/tmp/'
9
+ add_filter '/spec/'
10
+ end
6
11
 
7
12
  RSpec.configure do |config|
8
13
  config.expect_with :rspec do |c|
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: nlp-pure
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Reid Parham
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-02-15 00:00:00.000000000 Z
11
+ date: 2015-02-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake