RubyGems - greeb - Versions diffs - 0.2.3 → 0.2.4 - Mend

greeb 0.2.3 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 07673b32254cd2b0ab0edf0664fa59e46231dbe3
-  data.tar.gz: f8eaac92c0fd4d7dda99c4441117b5e2b34c5caa
+  metadata.gz: 0b353d307d409d5f67e7a7932e4d9aba26fd9dc8
+  data.tar.gz: 134d541cd2ccf1f6d71d6faea69edca8b650f08a
 SHA512:
-  metadata.gz: ebfda44f713c3dcda9df0439f073a1c075360c14cc41e99c400db8eec25f06033ba2d788b5c4b7b8715eeb5cc085e6c704f6e9bad08bfd6af7b4bc4d051a8c32
-  data.tar.gz: cb60f13ddad1e17a7cdbbd19add866677f92590cb7072dd5dc3cf70b80c93a31e0e857366908ff214bb0cc5b2c585415abd78338bd8479ea3150ebf58f5d2117
+  metadata.gz: cdac0fb93910ef3a3c2e78a9d8dbeb2aeb9b906982c00ca51cfe385e40a75595cd26e3241a6b5ae6bf3b6e3e3090688ba2fd5d2890c38b9483fd09b405d3a2f4
+  data.tar.gz: ad5b4faf513f95359b864699af9f5f9e34d6b78d8681ba57d8eada5f2b4ec833d2532571660a9e36f1537792eebf79d3144236cec48e7b3be349281433db8b70

data/.travis.yml CHANGED

@@ -1,8 +1,9 @@
+sudo: false
 language: ruby
 rvm:
-  - 2.0.0
+  - ruby
+  - rbx
   - jruby-19mode
-  - rbx-19mode
 matrix:
   allow_failures:
-    - rvm: rbx-19mode
+    - rvm: rbx

data/LICENSE CHANGED

@@ -1,4 +1,4 @@
-Copyright (c) 2010-2014 Dmitry Ustalov
+Copyright (c) 2010-2015 Dmitry Ustalov
 Permission is hereby granted, free of charge, to any person obtaining
 a copy of this software and associated documentation files (the

data/README.md CHANGED

@@ -1,9 +1,14 @@
 # Greeb
 Greeb [grʲip] is a simple yet awesome and Unicode-aware text segmentator
-that is based on regular expressions. The API documentation is available
-at <http://rubydoc.info/github/dmchk/greeb/master/frames>.
+based on regular expressions. The API documentation is available on
+[RubyDoc.info]. The software demonstration is available on
+<https://greeb.herokuapp.com>.
+[RubyDoc.info]: http://www.rubydoc.info/github/dustalov/greeb/master
 ## Installation
 Add this line to your application's Gemfile:
 ```ruby
@@ -19,11 +24,15 @@ Or install it yourself as:
     $ gem install greeb
 ## Usage
-Greeb can help you solve simple text processing problems such as
-tokenization and segmentation.
-It is available as a command line application that reads the input
-text from STDIN and prints one token per line into STDOUT.
+Greeb can approach such essential text processing problems as
+tokenization and segmentation. There are two ways to use it:
+1) as a command-line application, 2) as a Ruby library.
+### Command-Line Interface
+The `greeb` application reads the input text from `STDIN` and
+writes one token per line to `STDOUT`.
 ```
 % echo 'Hello http://nlpub.ru guys, how are you?' | greeb
@@ -38,6 +47,7 @@ you
 ```
 ### Tokenization API
 Greeb has a very convinient API that makes you happy.
 ```ruby
@@ -48,7 +58,8 @@ pp Greeb::Tokenizer.tokenize('Hello!')
 =end
 ```
-It should be noted that it is possible to process much complex texts.
+It should be noted that it is also possible to process much
+complex texts than the present one.
 ```ruby
 text =<<-EOF
@@ -91,8 +102,8 @@ pp Greeb::Tokenizer.tokenize(text)
 ```
 ### Segmentation API
-Also it can be used to solve the text segmentation problems
-such as sentence detection tasks.
+The analyzer can also perform sentence detection.
 ```ruby
 text = 'Hello! How are you?'
@@ -104,8 +115,8 @@ pp Greeb::Segmentator.new(tokens).sentences
 =end
 ```
-It is possible to extract tokens that were processed by the text
-segmentator.
+Having obtained the sentence boundaries, it is possible to
+extract tokens covered by these sentences.
 ```ruby
 text = 'Hello! How are you?'
@@ -127,10 +138,12 @@ pp segmentator.extract(segmentator.sentences)
 ```
 ### Parsing API
-Texts are often include some special spans such as URLs and e-mail
-addresses. Greeb can help you in these strings retrieval.
-#### URL and E-mail retrieval
+It is often that a text includes such special entries as URLs
+and e-mail addresses. Greeb can assist you in extracting them.
+#### Extraction of URLs and e-mails
 ```ruby
 text = 'My website is http://nlpub.ru and e-mail is example@example.com.'
@@ -145,9 +158,10 @@ pp Greeb::Parser.emails(text).map { |e| [e, e.slice(text)] }
 =end
 ```
-Please don't use Greeb in spam lists development purposes.
+Please do not use Greeb for the development of spam lists. Spam sucks.
+#### Extraction of abbreviations
-#### Abbreviation retrieval
 ```ruby
 text = 'Hello, G.L.H.F. everyone!'
@@ -160,7 +174,8 @@ pp Greeb::Parser.abbrevs(text).map { |e| [e, e.slice(text)] }
 The algorithm is not so accurate, but still useful in many practical
 situations.
-#### Timestamps retrieval
+#### Extraction of time stamps
 ```ruby
 text = 'Our time is running out: 13:37 or 14:89.'
@@ -171,7 +186,8 @@ pp Greeb::Parser.time(text).map { |e| [e, e.slice(text)] }
 ```
 ## Spans
-Greeb operates with spans, tuples of *(from, to, kind)*, where
+Greeb operates with spans, which are tuples of *(from, to, kind)*, where
 *from* is a beginning of the span, *to* is an ending of the span,
 and *kind* is a type of the span.
@@ -180,20 +196,23 @@ There are several span types at the tokenization stage: `:letter`,
 (for in-sentence punctuation), `:space`, and `:break`.
 ## Contributing
 1. Fork it;
 2. Create your feature branch (`git checkout -b my-new-feature`);
 3. Commit your changes (`git commit -am 'Added some feature'`);
 4. Push to the branch (`git push origin my-new-feature`);
 5. Create new Pull Request.
-## Build Status [<img src="https://secure.travis-ci.org/dmchk/greeb.png"/>](http://travis-ci.org/dmchk/greeb)
+## Build Status [<img src="https://secure.travis-ci.org/dustalov/greeb.png"/>](http://travis-ci.org/dustalov/greeb)
 ## Dependency Status [<img src="https://gemnasium.com/dmchk/greeb.png"/>](https://gemnasium.com/dmchk/greeb)
 ## Code Climate [<img src="https://codeclimate.com/github/dmchk/greeb.png"/>](https://codeclimate.com/github/dmchk/greeb)
+## DOI [<img src="https://zenodo.org/badge/doi/10.5281/zenodo.10119.png"/>](http://dx.doi.org/10.5281/zenodo.10119)
 ## Copyright
-Copyright (c) 2010-2014 [Dmitry Ustalov]. See LICENSE for details.
+Copyright (c) 2010-2015 [Dmitry Ustalov]. See LICENSE for details.
-[Dmitry Ustalov]: http://ustalov.name/
+[Dmitry Ustalov]: https://ustalov.name/

data/greeb.gemspec CHANGED

@@ -8,13 +8,13 @@ Gem::Specification.new do |spec|
   spec.platform    = Gem::Platform::RUBY
   spec.authors     = ['Dmitry Ustalov']
   spec.email       = ['dmitry@eveel.ru']
-  spec.homepage    = 'https://github.com/dmchk/greeb'
+  spec.homepage    = 'https://github.com/dustalov/greeb'
   spec.summary     = 'Greeb is a simple Unicode-aware regexp-based tokenizer.'
   spec.description = 'Greeb is a simple yet awesome and Unicode-aware ' \
                      'regexp-based tokenizer, written in Ruby.'
   spec.license     = 'MIT'
-  spec.rubyforge_project = 'greeb'
+  spec.required_ruby_version = '>= 1.9.1'
   spec.files         = `git ls-files`.split("\n")
   spec.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")

data/lib/greeb/parser.rb CHANGED

@@ -4,9 +4,7 @@
 # text. These entities are URLs, e-mail addresses, names, etc. This module
 # includes several helpers that could help to solve these problems.
 #
-module Greeb::Parser
-  extend self
+module Greeb::Parser extend self
   # An URL pattern. Not so precise, but IDN-compatible.
   #
   URL = %r{\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\p{L}\w\d]+\)|([^.\s]|/)))}i

data/lib/greeb/tokenizer.rb CHANGED

@@ -5,10 +5,7 @@
 # Unicode character categories been obtained from
 # <http://www.fileformat.info/info/unicode/category/index.htm>.
 #
-module Greeb::Tokenizer
-  # http://www.youtube.com/watch?v=eF1lU-CrQfc
-  extend self
+module Greeb::Tokenizer extend self
   # English and Russian letters.
   #
   LETTERS = /[\p{L}]+/u
@@ -55,7 +52,15 @@ module Greeb::Tokenizer
     scanner = Greeb::StringScanner.new(text)
     tokens = []
     while !scanner.eos?
-      step scanner, tokens or
+      parse! scanner, tokens, LETTERS, :letter or
+      parse! scanner, tokens, FLOATS, :float or
+      parse! scanner, tokens, INTEGERS, :integer or
+      split_parse! scanner, tokens, SENTENCE_PUNCTUATIONS, :spunct or
+      split_parse! scanner, tokens, PUNCTUATIONS, :punct or
+      split_parse! scanner, tokens, SEPARATORS, :separ or
+      split_parse! scanner, tokens, SPACES, :space or
+      split_parse! scanner, tokens, BREAKS, :break or
+      parse! scanner, tokens, RESIDUALS, :residual or
       raise Greeb::UnknownSpan.new(text, scanner.char_pos)
     end
     tokens
@@ -78,25 +83,6 @@ module Greeb::Tokenizer
   end
   protected
-  # One iteration of the tokenization process.
-  #
-  # @param scanner [Greeb::StringScanner] string scanner.
-  # @param tokens [Array<Greeb::Span>] result array.
-  #
-  # @return [Array<Greeb::Span>] the modified set of extracted tokens.
-  #
-  def step scanner, tokens
-    parse! scanner, tokens, LETTERS, :letter or
-    parse! scanner, tokens, FLOATS, :float or
-    parse! scanner, tokens, INTEGERS, :integer or
-    split_parse! scanner, tokens, SENTENCE_PUNCTUATIONS, :spunct or
-    split_parse! scanner, tokens, PUNCTUATIONS, :punct or
-    split_parse! scanner, tokens, SEPARATORS, :separ or
-    split_parse! scanner, tokens, SPACES, :space or
-    split_parse! scanner, tokens, BREAKS, :break or
-    parse! scanner, tokens, RESIDUALS, :residual
-  end
   # Try to parse one small piece of text that is covered by pattern
   # of necessary type.
   #

data/lib/greeb/version.rb CHANGED

@@ -5,5 +5,5 @@
 module Greeb
   # Version of Greeb.
   #
-  VERSION = '0.2.3'
+  VERSION = '0.2.4'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: greeb
 version: !ruby/object:Gem::Version
-  version: 0.2.3
+  version: 0.2.4
 platform: ruby
 authors:
 - Dmitry Ustalov
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-25 00:00:00.000000000 Z
+date: 2015-01-14 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: minitest
@@ -58,7 +58,7 @@ files:
 - spec/spec_helper.rb
 - spec/support/invoker.rb
 - spec/tokenizer_spec.rb
-homepage: https://github.com/dmchk/greeb
+homepage: https://github.com/dustalov/greeb
 licenses:
 - MIT
 metadata: {}
@@ -70,14 +70,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 1.9.1
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project: greeb
+rubyforge_project:
 rubygems_version: 2.2.2
 signing_key:
 specification_version: 4