nlp-pure 0.0.5 → 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +4 -4
- data/CHANGELOG.md +6 -0
- data/CONTRIBUTING.md +18 -4
- data/README.md +42 -4
- data/lib/nlp_pure/segmenting/default_word.rb +6 -4
- data/lib/nlp_pure/version.rb +1 -1
- data/spec/lib/segmenting/default_word_spec.rb +94 -0
- data/spec/spec_helper.rb +6 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c5bbc92e65c96837a6e53f28248e15d48a35abe1
|
4
|
+
data.tar.gz: 79f767942ba8723a3f5f6eb04ea0ec4498e02591
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9e00458afc1dadd851ea8ccd4e312ec19c6b775b455ebf6d5e599480dde8f333704a8c9601f62970bdefe26ffea7bd509bb3cd52314775d642865587d94e7214
|
7
|
+
data.tar.gz: 72abbd773eb915a11f76526b9bdfb37cbcd05c258aab45fd3c7e18c9fc1591c84c97cc3f99641ecee20ad27ea47d10daf9e35128d572d95aeb17aeda809e8a93
|
data/.travis.yml
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
language: ruby
|
2
2
|
sudo: false
|
3
3
|
cache: bundler
|
4
|
+
# NOTE: these run in order
|
4
5
|
rvm:
|
5
|
-
- 2.2
|
6
|
-
- 2.1
|
7
|
-
- 2.0.0
|
8
6
|
- jruby
|
9
7
|
- rbx-2
|
8
|
+
- 2.0.0
|
9
|
+
- 2.1
|
10
|
+
- 2.2
|
10
11
|
matrix:
|
11
12
|
allow_failures:
|
12
13
|
- rvm: rbx-2
|
13
14
|
- rvm: jruby
|
14
|
-
bundler_args: --without development
|
data/CHANGELOG.md
CHANGED
data/CONTRIBUTING.md
CHANGED
@@ -1,3 +1,5 @@
|
|
1
|
+
# Contributing
|
2
|
+
|
1
3
|
Pull requests are welcomed! Here’s a quick guide:
|
2
4
|
|
3
5
|
1. Fork the repo.
|
@@ -13,11 +15,23 @@ a test!
|
|
13
15
|
|
14
16
|
5. Push to your fork and submit a pull request.
|
15
17
|
|
16
|
-
|
18
|
+
|
19
|
+
## Project Goals
|
20
|
+
|
21
|
+
* Accuracy over speed
|
22
|
+
* One installation step (through `gem` or `bundle`)
|
23
|
+
* Minimal runtime dependencies (beyond the standard libraries)
|
24
|
+
* Effective collaboration (and minimized interpersonal conflict)
|
25
|
+
* Sustainability and maintainability (this isn’t a full-time project)
|
26
|
+
|
27
|
+
|
28
|
+
## Style Guide
|
29
|
+
|
30
|
+
See also: `rake rubocop`
|
17
31
|
|
18
32
|
* Two spaces, no tabs.
|
19
33
|
* No trailing whitespace. Blank lines should not have any space.
|
20
|
-
* Prefer
|
21
|
-
* MyClass.my_method(my_arg) not my_method( my_arg ) or my_method my_arg
|
22
|
-
* a = b
|
34
|
+
* Prefer `&& ||` over `and or`.
|
35
|
+
* Use `MyClass.my_method(my_arg)` not `my_method( my_arg )` or `my_method my_arg`.
|
36
|
+
* Prefer `a = b` to `a=b`.
|
23
37
|
* Follow the conventions you see used in the source already.
|
data/README.md
CHANGED
@@ -10,11 +10,16 @@ NOTE: this is not affiliated with, endorsed by, or in any way connected with [Pu
|
|
10
10
|
|
11
11
|
This project aims to provide functionality similar to [Treat](https://github.com/louismullie/treat), [open-nlp](https://github.com/louismullie/open-nlp), and [stanford-core-nlp](https://rubygems.org/gems/stanford-core-nlp) but with fewer dependencies. The code is tested against English language but the algorithm implementations aim to be flexible for other languages.
|
12
12
|
|
13
|
+
## Table of Contents
|
13
14
|
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
15
|
+
* [Installation](#installation)
|
16
|
+
* [Usage](#usage)
|
17
|
+
** [Word Segmentation](#word-segmentation)
|
18
|
+
* [Supported Ruby Versions](#supported-ruby-versions)
|
19
|
+
* [Versioning](#versioning)
|
20
|
+
* [Contributing](CONTRIBUTING.md)
|
21
|
+
* [License](LICENSE)
|
22
|
+
* [See Also](#see-also)
|
18
23
|
|
19
24
|
## Installation
|
20
25
|
|
@@ -89,3 +94,36 @@ Constraint](http://docs.rubygems.org/read/chapter/16#page74) with two digits of
|
|
89
94
|
```ruby
|
90
95
|
spec.add_dependency 'nlp-pure', '~> 0.1'
|
91
96
|
```
|
97
|
+
|
98
|
+
|
99
|
+
## See Also
|
100
|
+
|
101
|
+
[Search “nlp” at ruby-toolbox.com](https://www.ruby-toolbox.com/search?q=nlp)
|
102
|
+
|
103
|
+
* APIs
|
104
|
+
** [alchemy_api](https://github.com/dbalatero/alchemy_api)
|
105
|
+
** [napi-ruby](https://github.com/Maluuba/napi-ruby)
|
106
|
+
** [poliqarpr](https://github.com/apohllo/poliqarpr)
|
107
|
+
** [wlapi](https://github.com/arbox/wlapi)
|
108
|
+
* Bindings and Toolkits
|
109
|
+
** [open-nlp](https://github.com/louismullie/open-nlp)
|
110
|
+
** [stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp)
|
111
|
+
** [treat](https://github.com/louismullie/treat)
|
112
|
+
* Classification
|
113
|
+
** [linnaeus](https://github.com/djcp/linnaeus)
|
114
|
+
** [maxent_string_classifier](https://github.com/mccraigmccraig/maxent_string_classifier)
|
115
|
+
* N-Grams
|
116
|
+
** [ruby-ngram](https://github.com/tkellen/ruby-ngram)
|
117
|
+
* Specific Languages
|
118
|
+
** Polish
|
119
|
+
*** [nlp](https://github.com/knife/nlp)
|
120
|
+
* Stopwords
|
121
|
+
** [clarifier](https://github.com/meducation/clarifier)
|
122
|
+
** [stopwords](https://github.com/brez/stopwords)
|
123
|
+
** [stopwords-filter](https://github.com/brenes/stopwords-filter)
|
124
|
+
* Tokenization
|
125
|
+
** [rseg](https://rubygems.org/gems/rseg)
|
126
|
+
** [Tokenizer](https://github.com/arbox/tokenizer)
|
127
|
+
* Word Counters
|
128
|
+
** [words_counted](https://github.com/abitdodgy/words_counted)
|
129
|
+
|
@@ -16,13 +16,15 @@ module NlpPure
|
|
16
16
|
]
|
17
17
|
}.freeze
|
18
18
|
|
19
|
-
|
19
|
+
module_function
|
20
|
+
|
21
|
+
def parse(*args)
|
20
22
|
unless args.nil? || args.empty?
|
21
|
-
clean_input(args[0]).split(options
|
23
|
+
clean_input(args[0]).split(options.fetch(:split, nil))
|
22
24
|
end
|
23
25
|
end
|
24
26
|
|
25
|
-
def
|
27
|
+
def clean_input(text = nil)
|
26
28
|
input = text.to_s
|
27
29
|
# perform replacements to work around the limitations of the splitting regexp
|
28
30
|
options.fetch(:gsub, []).each do |gsub_pair|
|
@@ -33,7 +35,7 @@ module NlpPure
|
|
33
35
|
end
|
34
36
|
|
35
37
|
# NOTE: exposed as a method for easy mock/stub
|
36
|
-
def
|
38
|
+
def options
|
37
39
|
DEFAULT_OPTIONS
|
38
40
|
end
|
39
41
|
end
|
data/lib/nlp_pure/version.rb
CHANGED
@@ -7,6 +7,12 @@ describe NlpPure::Segmenting::DefaultWord do
|
|
7
7
|
it 'is defined' do
|
8
8
|
expect(defined?(NlpPure::Segmenting::DefaultWord)).to be_truthy
|
9
9
|
end
|
10
|
+
|
11
|
+
describe '::DEFAULT_OPTIONS' do
|
12
|
+
it 'is Hash' do
|
13
|
+
expect(NlpPure::Segmenting::DefaultWord::DEFAULT_OPTIONS).to be_a Hash
|
14
|
+
end
|
15
|
+
end
|
10
16
|
end
|
11
17
|
|
12
18
|
describe '.parse' do
|
@@ -27,6 +33,26 @@ describe NlpPure::Segmenting::DefaultWord do
|
|
27
33
|
let(:english_simple_paragraph) { 'Mary had a little lamb. The lamb’s fleece was white as snow. Everywhere that Mary went, the lamb was sure to go.' }
|
28
34
|
let(:english_simple_line_breaks) { "Mary had a little lamb,\nHis fleece was white as snow,\nAnd everywhere that Mary went,\nThe lamb was sure to go." }
|
29
35
|
|
36
|
+
context '(with nil options)' do
|
37
|
+
before do
|
38
|
+
expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return(nil)
|
39
|
+
end
|
40
|
+
|
41
|
+
it 'raises NoMethodError' do
|
42
|
+
expect { NlpPure::Segmenting::DefaultWord.parse(english_simple_sentence) }.to raise_error
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
context '(with blank options)' do
|
47
|
+
before do
|
48
|
+
expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return({})
|
49
|
+
end
|
50
|
+
|
51
|
+
it 'returns Array' do
|
52
|
+
expect(NlpPure::Segmenting::DefaultWord.parse(english_simple_sentence)).to be_an Array
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
30
56
|
context '(with default options)' do
|
31
57
|
context 'with `nil` argument' do
|
32
58
|
it 'does not raise error' do
|
@@ -107,6 +133,74 @@ describe NlpPure::Segmenting::DefaultWord do
|
|
107
133
|
it 'correctly counts with line breaks' do
|
108
134
|
expect(NlpPure::Segmenting::DefaultWord.parse(english_simple_line_breaks).length).to eq(22)
|
109
135
|
end
|
136
|
+
|
137
|
+
context 'benchmarking' do
|
138
|
+
before do
|
139
|
+
require 'benchmark'
|
140
|
+
end
|
141
|
+
|
142
|
+
it 'takes time', benchmarking: true do
|
143
|
+
expect(
|
144
|
+
Benchmark.realtime do
|
145
|
+
1000.times do
|
146
|
+
NlpPure::Segmenting::DefaultWord.parse(english_simple_line_breaks)
|
147
|
+
end
|
148
|
+
end
|
149
|
+
).to be < 0.1
|
150
|
+
end
|
151
|
+
end
|
152
|
+
end
|
153
|
+
end
|
154
|
+
end
|
155
|
+
|
156
|
+
describe '.clean_input' do
|
157
|
+
context 'English' do
|
158
|
+
let(:english_leading_ellipsis_sentence) { ' … the quick brown fox jumps over the lazy dog.' }
|
159
|
+
|
160
|
+
context '(with nil options)' do
|
161
|
+
before do
|
162
|
+
expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return(nil)
|
163
|
+
end
|
164
|
+
|
165
|
+
it 'raises NoMethodError' do
|
166
|
+
expect { NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence) }.to raise_error
|
167
|
+
end
|
168
|
+
end
|
169
|
+
|
170
|
+
context '(with blank options)' do
|
171
|
+
before do
|
172
|
+
expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return({})
|
173
|
+
end
|
174
|
+
|
175
|
+
it 'only strips whitespace' do
|
176
|
+
expect(NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence)).to eq english_leading_ellipsis_sentence.strip
|
177
|
+
end
|
178
|
+
end
|
179
|
+
|
180
|
+
context '(with default options)' do
|
181
|
+
context 'with `nil` argument' do
|
182
|
+
it 'does not raise error' do
|
183
|
+
expect { NlpPure::Segmenting::DefaultWord.clean_input(nil) }.to_not raise_error
|
184
|
+
end
|
185
|
+
|
186
|
+
it 'returns empty String' do
|
187
|
+
expect(NlpPure::Segmenting::DefaultWord.clean_input(nil)).to eq ''
|
188
|
+
end
|
189
|
+
end
|
190
|
+
|
191
|
+
context 'without arguments' do
|
192
|
+
it 'does not raise error' do
|
193
|
+
expect { NlpPure::Segmenting::DefaultWord.clean_input }.to_not raise_error
|
194
|
+
end
|
195
|
+
|
196
|
+
it 'returns nil' do
|
197
|
+
expect(NlpPure::Segmenting::DefaultWord.clean_input).to eq ''
|
198
|
+
end
|
199
|
+
end
|
200
|
+
|
201
|
+
it 'modifies the input' do
|
202
|
+
expect(NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence)).to_not eq english_leading_ellipsis_sentence
|
203
|
+
end
|
110
204
|
end
|
111
205
|
end
|
112
206
|
end
|
data/spec/spec_helper.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: nlp-pure
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0
|
4
|
+
version: 0.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Reid Parham
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-02-
|
11
|
+
date: 2015-02-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|