RubyGems - nlp-pure - Versions diffs - 0.0.5 → 0.1.0 - Mend

nlp-pure 0.0.5 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/.travis.yml +4 -4
data/CHANGELOG.md +6 -0
data/CONTRIBUTING.md +18 -4
data/README.md +42 -4
data/lib/nlp_pure/segmenting/default_word.rb +6 -4
data/lib/nlp_pure/version.rb +1 -1
data/spec/lib/segmenting/default_word_spec.rb +94 -0
data/spec/spec_helper.rb +6 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 8ae3951baabcafe913e157a575e3dc718a646f16
-  data.tar.gz: 14a6567449629a482bdc8863ffbfd04ae72af61b
+  metadata.gz: c5bbc92e65c96837a6e53f28248e15d48a35abe1
+  data.tar.gz: 79f767942ba8723a3f5f6eb04ea0ec4498e02591
 SHA512:
-  metadata.gz: f1766d42dd2916bdb0491448a9db0122b86f31325e3f12ce94c0d6b403cf5ecf50e4e95139f018f76896c1ef432e71a11dc36d8c2c597dc0870f400fb56bfeae
-  data.tar.gz: b3baa2f16339813070ffa978e03e8972046baebba815259d695da517baedc2830a55d0cfa75dcd234e307d2f09684194c02aa83a3b919f3db08be1426eb71537
+  metadata.gz: 9e00458afc1dadd851ea8ccd4e312ec19c6b775b455ebf6d5e599480dde8f333704a8c9601f62970bdefe26ffea7bd509bb3cd52314775d642865587d94e7214
+  data.tar.gz: 72abbd773eb915a11f76526b9bdfb37cbcd05c258aab45fd3c7e18c9fc1591c84c97cc3f99641ecee20ad27ea47d10daf9e35128d572d95aeb17aeda809e8a93

data/.travis.yml CHANGED

@@ -1,14 +1,14 @@
 language: ruby
 sudo: false
 cache: bundler
+# NOTE: these run in order
 rvm:
-  - 2.2
-  - 2.1
-  - 2.0.0
   - jruby
   - rbx-2
+  - 2.0.0
+  - 2.1
+  - 2.2
 matrix:
   allow_failures:
     - rvm: rbx-2
     - rvm: jruby
-bundler_args: --without development

data/CHANGELOG.md CHANGED

@@ -1,3 +1,9 @@
+# 0.1.0
+Officially leaving a non-semantic versioning scheme.
+Added benchmarking test.
 # 0.0.5
 Fixed bug in `NlpPure::Segmenting::DefaultWord` where leading ellipses could produce extra segmented words.

data/CONTRIBUTING.md CHANGED

@@ -1,3 +1,5 @@
+# Contributing
 Pull requests are welcomed! Here’s a quick guide:
 1. Fork the repo.
@@ -13,11 +15,23 @@ a test!
 5. Push to your fork and submit a pull request.
-Syntax:
+## Project Goals
+* Accuracy over speed
+* One installation step (through `gem` or `bundle`)
+* Minimal runtime dependencies (beyond the standard libraries)
+* Effective collaboration (and minimized interpersonal conflict)
+* Sustainability and maintainability (this isn’t a full-time project)
+## Style Guide
+See also: `rake rubocop`
 * Two spaces, no tabs.
 * No trailing whitespace. Blank lines should not have any space.
-* Prefer &&/|| over and/or.
-* MyClass.my_method(my_arg) not my_method( my_arg ) or my_method my_arg.
-* a = b and not a=b.
+* Prefer `&& ||` over `and or`.
+* Use `MyClass.my_method(my_arg)` not `my_method( my_arg )` or `my_method my_arg`.
+* Prefer `a = b` to `a=b`.
 * Follow the conventions you see used in the source already.

data/README.md CHANGED

@@ -10,11 +10,16 @@ NOTE: this is not affiliated with, endorsed by, or in any way connected with [Pu
 This project aims to provide functionality similar to [Treat](https://github.com/louismullie/treat), [open-nlp](https://github.com/louismullie/open-nlp), and [stanford-core-nlp](https://rubygems.org/gems/stanford-core-nlp) but with fewer dependencies. The code is tested against English language but the algorithm implementations aim to be flexible for other languages.
+## Table of Contents
-## Requirements
-TODO
+* [Installation](#installation)
+* [Usage](#usage)
+** [Word Segmentation](#word-segmentation)
+* [Supported Ruby Versions](#supported-ruby-versions)
+* [Versioning](#versioning)
+* [Contributing](CONTRIBUTING.md)
+* [License](LICENSE)
+* [See Also](#see-also)
 ## Installation
@@ -89,3 +94,36 @@ Constraint](http://docs.rubygems.org/read/chapter/16#page74) with two digits of
 ```ruby
 spec.add_dependency 'nlp-pure', '~> 0.1'
 ```
+## See Also
+[Search “nlp” at ruby-toolbox.com](https://www.ruby-toolbox.com/search?q=nlp)
+* APIs
+** [alchemy_api](https://github.com/dbalatero/alchemy_api)
+** [napi-ruby](https://github.com/Maluuba/napi-ruby)
+** [poliqarpr](https://github.com/apohllo/poliqarpr)
+** [wlapi](https://github.com/arbox/wlapi)
+* Bindings and Toolkits
+** [open-nlp](https://github.com/louismullie/open-nlp)
+** [stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp)
+** [treat](https://github.com/louismullie/treat)
+* Classification
+** [linnaeus](https://github.com/djcp/linnaeus)
+** [maxent_string_classifier](https://github.com/mccraigmccraig/maxent_string_classifier)
+* N-Grams
+** [ruby-ngram](https://github.com/tkellen/ruby-ngram)
+* Specific Languages
+** Polish
+*** [nlp](https://github.com/knife/nlp)
+* Stopwords
+** [clarifier](https://github.com/meducation/clarifier)
+** [stopwords](https://github.com/brez/stopwords)
+** [stopwords-filter](https://github.com/brenes/stopwords-filter)
+* Tokenization
+** [rseg](https://rubygems.org/gems/rseg)
+** [Tokenizer](https://github.com/arbox/tokenizer)
+* Word Counters
+** [words_counted](https://github.com/abitdodgy/words_counted)

data/lib/nlp_pure/segmenting/default_word.rb CHANGED

@@ -16,13 +16,15 @@ module NlpPure
         ]
       }.freeze
-      def self.parse(*args)
+      module_function
+      def parse(*args)
         unless args.nil? || args.empty?
-          clean_input(args[0]).split(options[:split])
+          clean_input(args[0]).split(options.fetch(:split, nil))
         end
       end
-      def self.clean_input(text = nil)
+      def clean_input(text = nil)
         input = text.to_s
         # perform replacements to work around the limitations of the splitting regexp
         options.fetch(:gsub, []).each do |gsub_pair|
@@ -33,7 +35,7 @@ module NlpPure
       end
       # NOTE: exposed as a method for easy mock/stub
-      def self.options
+      def options
         DEFAULT_OPTIONS
       end
     end

data/lib/nlp_pure/version.rb CHANGED

@@ -1,5 +1,5 @@
 # encoding: utf-8
 #
 module NlpPure
-  VERSION = '0.0.5'
+  VERSION = '0.1.0'
 end

data/spec/lib/segmenting/default_word_spec.rb CHANGED

@@ -7,6 +7,12 @@ describe NlpPure::Segmenting::DefaultWord do
     it 'is defined' do
       expect(defined?(NlpPure::Segmenting::DefaultWord)).to be_truthy
     end
+    describe '::DEFAULT_OPTIONS' do
+      it 'is Hash' do
+        expect(NlpPure::Segmenting::DefaultWord::DEFAULT_OPTIONS).to be_a Hash
+      end
+    end
   end
   describe '.parse' do
@@ -27,6 +33,26 @@ describe NlpPure::Segmenting::DefaultWord do
       let(:english_simple_paragraph) { 'Mary had a little lamb. The lamb’s fleece was white as snow. Everywhere that Mary went, the lamb was sure to go.' }
       let(:english_simple_line_breaks) { "Mary had a little lamb,\nHis fleece was white as snow,\nAnd everywhere that Mary went,\nThe lamb was sure to go." }
+      context '(with nil options)' do
+        before do
+          expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return(nil)
+        end
+        it 'raises NoMethodError' do
+          expect { NlpPure::Segmenting::DefaultWord.parse(english_simple_sentence) }.to raise_error
+        end
+      end
+      context '(with blank options)' do
+        before do
+          expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return({})
+        end
+        it 'returns Array' do
+          expect(NlpPure::Segmenting::DefaultWord.parse(english_simple_sentence)).to be_an Array
+        end
+      end
       context '(with default options)' do
         context 'with `nil` argument' do
           it 'does not raise error' do
@@ -107,6 +133,74 @@ describe NlpPure::Segmenting::DefaultWord do
         it 'correctly counts with line breaks' do
           expect(NlpPure::Segmenting::DefaultWord.parse(english_simple_line_breaks).length).to eq(22)
         end
+        context 'benchmarking' do
+          before do
+            require 'benchmark'
+          end
+          it 'takes time', benchmarking: true do
+            expect(
+              Benchmark.realtime do
+                1000.times do
+                  NlpPure::Segmenting::DefaultWord.parse(english_simple_line_breaks)
+                end
+              end
+            ).to be < 0.1
+          end
+        end
+      end
+    end
+  end
+  describe '.clean_input' do
+    context 'English' do
+      let(:english_leading_ellipsis_sentence) { ' … the quick brown fox jumps over the lazy dog.' }
+      context '(with nil options)' do
+        before do
+          expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return(nil)
+        end
+        it 'raises NoMethodError' do
+          expect { NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence) }.to raise_error
+        end
+      end
+      context '(with blank options)' do
+        before do
+          expect(NlpPure::Segmenting::DefaultWord).to receive(:options).at_least(:once).and_return({})
+        end
+        it 'only strips whitespace' do
+          expect(NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence)).to eq english_leading_ellipsis_sentence.strip
+        end
+      end
+      context '(with default options)' do
+        context 'with `nil` argument' do
+          it 'does not raise error' do
+            expect { NlpPure::Segmenting::DefaultWord.clean_input(nil) }.to_not raise_error
+          end
+          it 'returns empty String' do
+            expect(NlpPure::Segmenting::DefaultWord.clean_input(nil)).to eq ''
+          end
+        end
+        context 'without arguments' do
+          it 'does not raise error' do
+            expect { NlpPure::Segmenting::DefaultWord.clean_input }.to_not raise_error
+          end
+          it 'returns nil' do
+            expect(NlpPure::Segmenting::DefaultWord.clean_input).to eq ''
+          end
+        end
+        it 'modifies the input' do
+          expect(NlpPure::Segmenting::DefaultWord.clean_input(english_leading_ellipsis_sentence)).to_not eq english_leading_ellipsis_sentence
+        end
       end
     end
   end

data/spec/spec_helper.rb CHANGED

@@ -2,7 +2,12 @@
 require 'rspec'
 require 'coveralls'
-Coveralls.wear!
+Coveralls.wear! do
+  add_filter '/vendor/'
+  add_filter '/test/'
+  add_filter '/tmp/'
+  add_filter '/spec/'
+end
 RSpec.configure do |config|
   config.expect_with :rspec do |c|

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: nlp-pure
 version: !ruby/object:Gem::Version
-  version: 0.0.5
+  version: 0.1.0
 platform: ruby
 authors:
 - Reid Parham
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-02-15 00:00:00.000000000 Z
+date: 2015-02-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake