ruby_ngrams 0.0.2 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. data/README.rdoc +74 -8
  2. data/bin/ruby_ngrams +7 -8
  3. metadata +19 -8
data/README.rdoc CHANGED
@@ -1,18 +1,84 @@
1
1
  = ruby_ngrams
2
2
 
3
- == License
3
+ Author:: Martin Velez
4
+ Copyright:: Copyright (c) 2011 Martin Velez
5
+ License:: Distributed under the same terms as Ruby
4
6
 
5
- Copyright 2011 Martin Velez
7
+ = Description
6
8
 
7
- == Features
9
+ ruby_ngrams is an extension of Ruby's core String class. It provides a String
10
+ object with the capability to produce n-grams.
8
11
 
9
- * parses a string into a set of n-grams
12
+ From Wikipedia,
13
+ "In the fields of computational linguistics and probability, an n-gram is a
14
+ contiguous sequence of n items from a given sequence of text or speech. The
15
+ items in question can be phonemes, syllables, letters, words or base pairs
16
+ according to the application. n-grams are collected from a text or speech corpus.
10
17
 
11
- == Installation
18
+ An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"
19
+ (or, less commonly, a "digram"); size 3 is a "trigram"; size 4 is a "four-gram"
20
+ and size 5 or more is simply called an "n-gram"."
12
21
 
13
- * gem install ruby_ngrams
22
+ = Design
14
23
 
15
- == Usage
24
+ Instead of creating another namespace, this task seemed simple enough to merit
25
+ extending the String class. A string is a sequence of characters.
26
+ It can be words, binary code, sentences, paragraphs, etc. In short,
27
+ anything that you can store in a Ruby String object can be parsed into
28
+ n-grams of length n.
16
29
 
17
- usage goes here
30
+ The main method being added to the String class is ngrams(). It produces an
31
+ array of n-grams from a Ruby String object.
18
32
 
33
+ For example, let s be a Ruby String object.
34
+ Then s.ngrams() returns array of n-grams of from s.
35
+
36
+ Tokenization of s is set to single characters by default.
37
+ For example, if s = "Hello World!",
38
+ then the tokens of s are ["H","e","l","l","o"," ","W","o","r","l","d","!"].
39
+ By specifying a regular expression, you can tokenize the string s in many
40
+ different and useful ways.
41
+
42
+ If you set n = 4, then
43
+ s.ngrams = [["H", "e", "l", "l"],
44
+ ["e", "l", "l", "o"],
45
+ ["l", "l", "o", " "],
46
+ ["l", "o", " ", "W"],
47
+ ["o", " ", "W", "o"],
48
+ [" ", "W", "o", "r"],
49
+ ["W", "o", "r", "l"],
50
+ ["o", "r", "l", "d"],
51
+ ["r", "l", "d", "!"]].
52
+ Each item in the s.ngrams array can joined but doesn't need to be.
53
+ If you want to join them, normally you can do so easily if it is text.
54
+ Be careful if you are trying to join n-grams with non-printable characters.
55
+
56
+ You can google "n-grams" to get more information about how n-grams are useful.
57
+
58
+ = Installation
59
+
60
+ gem install ruby_ngrams
61
+
62
+ = Alternative Tools
63
+
64
+ This is another tool I found but which did too much. I only wanted
65
+ to produce n-grams from a string.
66
+ 1. raingrams[https://github.com/postmodern/raingrams]
67
+
68
+ = Usage
69
+
70
+ ./ruby_ngrams --
71
+
72
+ = Dependencies
73
+
74
+ * Ruby 1.9.1 or greater
75
+ * ruby_cli[https://github.com/martinvelez/ruby_cli] to run the gem executable
76
+
77
+ = TODO
78
+
79
+ * Test to determine limits of current approach which parses and stores n-grams
80
+ in memory.
81
+
82
+ = Source Code
83
+
84
+ https://github.com/martinvelez/ruby_ngrams
data/bin/ruby_ngrams CHANGED
@@ -1,4 +1,4 @@
1
- #!/usr/bin/ruby -w
1
+ #!/usr/bin/env ruby
2
2
 
3
3
  require 'ruby_cli'
4
4
  require 'ruby_ngrams'
@@ -8,10 +8,11 @@ class App
8
8
 
9
9
  def define_command_options() @options = {:regex => //, :n => 2} end
10
10
 
11
- # Define an OptionParser to parse the command line
12
- def parse_options?
11
+ # Redefining the RubyCLI define_option_parser method
12
+ # Need to tell the OptionParser how to handle this command specific options.
13
+ def define_option_parser
13
14
  #configure an OptionParser
14
- @opt_parser = OptionParser.new do |opts|
15
+ OptionParser.new do |opts|
15
16
  opts.banner = "Usage: #{__FILE__} [OPTIONS]... [FILE]..."
16
17
  opts.separator ""
17
18
  opts.separator "Specific options:"
@@ -25,12 +26,10 @@ class App
25
26
  opts.on('-n', '--n NUM', Integer, 'set length n for n-grams') do |n|
26
27
  @options[:n] = n
27
28
  end
28
- opts.on('-r', '--regex REGEX', Regexp, 'set regex to split string into tokens') do |r|
29
+ opts.on('-r', '--regex "REGEX"', Regexp, 'set regex to split string into tokens') do |r|
29
30
  @options[:regex] = r
30
31
  end
31
32
  end
32
- @opt_parser.parse!(@default_argv) rescue return false
33
- true
34
33
  end
35
34
 
36
35
  def command
@@ -52,7 +51,7 @@ end
52
51
 
53
52
 
54
53
  if __FILE__ == $0
55
- app = App.new(ARGV)
54
+ app = App.new(ARGV, __FILE__)
56
55
  app.run
57
56
  end
58
57
 
metadata CHANGED
@@ -1,13 +1,12 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby_ngrams
3
3
  version: !ruby/object:Gem::Version
4
- hash: 27
5
4
  prerelease: false
6
5
  segments:
7
6
  - 0
8
7
  - 0
9
- - 2
10
- version: 0.0.2
8
+ - 4
9
+ version: 0.0.4
11
10
  platform: ruby
12
11
  authors:
13
12
  - Martin Velez
@@ -15,10 +14,24 @@ autorequire:
15
14
  bindir: bin
16
15
  cert_chain: []
17
16
 
18
- date: 2011-11-11 00:00:00 -08:00
17
+ date: 2011-11-29 00:00:00 -08:00
19
18
  default_executable:
20
- dependencies: []
21
-
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: ruby_cli
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ none: false
25
+ requirements:
26
+ - - ">="
27
+ - !ruby/object:Gem::Version
28
+ segments:
29
+ - 0
30
+ - 1
31
+ - 0
32
+ version: 0.1.0
33
+ type: :runtime
34
+ version_requirements: *id001
22
35
  description: A simple extension of the Ruby core string class to parse a string into n-grams
23
36
  email: mvelez999@gmail.com
24
37
  executables:
@@ -46,7 +59,6 @@ required_ruby_version: !ruby/object:Gem::Requirement
46
59
  requirements:
47
60
  - - ">="
48
61
  - !ruby/object:Gem::Version
49
- hash: 3
50
62
  segments:
51
63
  - 0
52
64
  version: "0"
@@ -55,7 +67,6 @@ required_rubygems_version: !ruby/object:Gem::Requirement
55
67
  requirements:
56
68
  - - ">="
57
69
  - !ruby/object:Gem::Version
58
- hash: 3
59
70
  segments:
60
71
  - 0
61
72
  version: "0"