RubyGems - raingrams - Versions diffs - 0.1.1 → 0.1.2 - Mend

raingrams 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

data/History.txt +22 -3
data/LICENSE.txt +1 -1
data/Manifest.txt +13 -6
data/README.txt +27 -25
data/Rakefile +2 -2
data/lib/raingrams/helpers.rb +5 -0
data/lib/raingrams/helpers/commonality.rb +67 -0
data/lib/raingrams/helpers/frequency.rb +43 -0
data/lib/raingrams/helpers/probability.rb +67 -0
data/lib/raingrams/helpers/random.rb +122 -0
data/lib/raingrams/helpers/similarity.rb +38 -0
data/lib/raingrams/model.rb +30 -304
data/lib/raingrams/probability_table.rb +9 -0
data/lib/raingrams/tokens/tokens.rb +35 -0
data/lib/raingrams/version.rb +1 -1
data/tasks/spec.rb +2 -0
metadata +20 -14

data/History.txt CHANGED Viewed

@@ -1,4 +1,23 @@
-== 0.1.1 / 2008-10-12
+=== 0.1.2 / 2009-04-23
+* Require nokogiri >= 1.2.0.
+* No longer require hpricot.
+* Added missing 'lib/raingrams/tokens/tokens.rb' file to the Manifest.
+* Added Raingrams::Helpers:
+  * Moved text commonality calculating methods into
+    Raingrams::Helpers::Commonality.
+  * Moved text frequency calculating methods into
+    Raingrams::Helpers::Frequency.
+  * Moved text probability calculating methods into
+    Raingrams::Helpers::Probability.
+  * Moved random text generating methods into
+    Raingrams::Helpers::Random.
+  * Moved text similarity calculating methods into
+    Raingrams::Helpers::Similarity.
+* Added Model#to_hash.
+* Capitalize randomly generated sentences if case is ignored.
+=== 0.1.1 / 2008-10-12
 * Improved the parsing abilities of Model#parse_sentence and
   Model#parse_text.
@@ -26,7 +45,7 @@
   * Model#frequencies_of_ngrams.
   * Model#save.
-== 0.1.0 / 2008-10-06
+=== 0.1.0 / 2008-10-06
 * Various bug fixes.
 * Added NgramSet and ProbabilityTable classes.
@@ -35,7 +54,7 @@
 * Added random_gram_sentence, random_sentence, random_paragraph and
   random_text methods to the Model class.
-== 0.0.9 / 2008-01-09
+=== 0.0.9 / 2008-01-09
 * Initial release.
 * Supports all non-zero ngram sizes.

data/LICENSE.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 The MIT License
-Copyright (c) 2007-2008 Hal Brodigan
+Copyright (c) 2007-2009 Hal Brodigan
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

data/Manifest.txt CHANGED Viewed

@@ -5,27 +5,33 @@ README.txt
 TODO.txt
 Rakefile
 lib/raingrams.rb
-lib/raingrams/version.rb
-lib/raingrams/raingrams.rb
-lib/raingrams/exceptions/prefix_frequency_missing.rb
 lib/raingrams/exceptions.rb
+lib/raingrams/exceptions/prefix_frequency_missing.rb
+lib/raingrams/extensions.rb
 lib/raingrams/extensions/object.rb
 lib/raingrams/extensions/string.rb
-lib/raingrams/extensions.rb
+lib/raingrams/tokens.rb
 lib/raingrams/tokens/token.rb
 lib/raingrams/tokens/start_sentence.rb
 lib/raingrams/tokens/stop_sentence.rb
 lib/raingrams/tokens/unknown.rb
-lib/raingrams/tokens.rb
+lib/raingrams/tokens/tokens.rb
 lib/raingrams/ngram.rb
 lib/raingrams/ngram_set.rb
 lib/raingrams/probability_table.rb
+lib/raingrams/helpers.rb
+lib/raingrams/helpers/frequency.rb
+lib/raingrams/helpers/probability.rb
+lib/raingrams/helpers/similarity.rb
+lib/raingrams/helpers/commonality.rb
+lib/raingrams/helpers/random.rb
 lib/raingrams/model.rb
 lib/raingrams/bigram_model.rb
 lib/raingrams/trigram_model.rb
 lib/raingrams/quadgram_model.rb
 lib/raingrams/pentagram_model.rb
 lib/raingrams/hexagram_model.rb
+lib/raingrams/open_vocabulary.rb
 lib/raingrams/open_vocabulary/open_model.rb
 lib/raingrams/open_vocabulary/model.rb
 lib/raingrams/open_vocabulary/bigram_model.rb
@@ -33,7 +39,8 @@ lib/raingrams/open_vocabulary/trigram_model.rb
 lib/raingrams/open_vocabulary/quadgram_model.rb
 lib/raingrams/open_vocabulary/pentagram_model.rb
 lib/raingrams/open_vocabulary/hexagram_model.rb
-lib/raingrams/open_vocabulary.rb
+lib/raingrams/version.rb
+lib/raingrams/raingrams.rb
 tasks/spec.rb
 spec/training/snowcrash.txt
 spec/helpers/training.rb

data/README.txt CHANGED Viewed

@@ -1,7 +1,8 @@
 = Raingrams
 * http://raingrams.rubyforge.org/
-* Postmodern Modulus III (postmodern.mod3@gmail.com)
+* http://github.com/postmodern/raingrams/
+* Postmodern (postmodern.mod3 at gmail.com)
 == DESCRIPTION:
@@ -20,7 +21,7 @@ parsing styles and open/closed vocabulary models.
 == REQUIREMENTS:
-* Hpricot
+* {nokogiri}[http://nokogiri.rubyforge.org/] >= 1.2.0
 == INSTALL:
@@ -30,47 +31,48 @@ parsing styles and open/closed vocabulary models.
 * Train a model with ycombinator comments:
-  require 'raingrams'
-  require 'hpricot'
-  require 'open-uri'
-  include Raingrams
-  model = BigramModel.build do |model|
-    doc = Hpricot(open('http://news.ycombinator.org/newcomments'))
-    doc.search('span.comment') do |span|
-      model.train_with_text(span.inner_text)
+    require 'raingrams'
+    require 'nokogiri'
+    require 'open-uri'
+    include Raingrams
+    model = BigramModel.build do |model|
+      doc = Nokogiri::HTML(open('http://news.ycombinator.org/newcomments'))
+      doc.search('span.comment') do |span|
+        model.train_with_text(span.inner_text)
+      end
     end
-  end
 * Update a trained model:
-  model.train_with_text %{Interesting videos. Anders talks about functional
-    support on .net, concurrency, immutability. Guy Steele talks about
-    Fortress on JVM. Too bad they are afraid of macros (access to AST),
-    though Steele does say Fortress has some support.}
-  model.refresh
+    model.train_with_text %{Interesting videos. Anders talks about
+      functional support on .net, concurrency, immutability. Guy Steele
+      talks about Fortress on JVM. Too bad they are afraid of macros
+      (access to AST), though Steele does say Fortress has some support.}
+    model.refresh
 * Generate a random sentence:
-  model.random_sentence
-  # => "OTOOH if you use slicehost even offer to bash Apple makes it will
-  exit and its 38 month ago based configuration of little networks created."
+    model.random_sentence
+    # => "OTOOH if you use slicehost even offer to bash Apple makes it will
+    exit and its 38 month ago based configuration of little networks
+    created."
 * Dump a model to a file, to be marshaled later:
-  model.save('path/for/model')
+    model.save('path/for/model')
 * Load a model from a file:
-  Model.open('path/for/model')
+    Model.open('path/for/model')
 == LICENSE:
 The MIT License
-Copyright (c) 2007-2008 Hal Brodigan
+Copyright (c) 2007-2009 Hal Brodigan
 Permission is hereby granted, free of charge, to any person obtaining
 a copy of this software and associated documentation files (the

data/Rakefile CHANGED Viewed

@@ -7,9 +7,9 @@ require './lib/raingrams/version.rb'
 Hoe.new('raingrams', Raingrams::VERSION) do |p|
   p.rubyforge_name = 'raingrams'
-  p.developer('Postmodern Modulus III', 'postmodern.mod3@gmail.com')
+  p.developer('Postmodern', 'postmodern.mod3@gmail.com')
   p.remote_rdoc_dir = 'docs'
-  p.extra_deps = ['hpricot']
+  p.extra_deps = [['nokogiri', '>=1.2.0']]
 end
 # vim: syntax=Ruby

data/lib/raingrams/helpers.rb ADDED Viewed

@@ -0,0 +1,5 @@
+require 'raingrams/helpers/frequency'
+require 'raingrams/helpers/probability'
+require 'raingrams/helpers/similarity'
+require 'raingrams/helpers/commonality'
+require 'raingrams/helpers/random'

data/lib/raingrams/helpers/commonality.rb ADDED Viewed

@@ -0,0 +1,67 @@
+require 'raingrams/helpers/probability'
+module Raingrams
+  module Helpers
+    module Commonality
+      def self.included(base)
+        base.module_eval { include Raingrams::Helpers::Probability }
+      end
+      #
+      # Returns the ngrams which occur within the specified _words_ and
+      # within the model.
+      #
+      def common_ngrams_from_words(words)
+        ngrams_from_words(words).select { |ngram| has_ngram?(ngram) }
+      end
+      #
+      # Returns the ngrams which occur within the specified _fragment_ and
+      # within the model.
+      #
+      def common_ngrams_from_fragment(fragment)
+        ngrams_from_fragment(fragment).select { |ngram| has_ngram?(ngram) }
+      end
+      #
+      # Returns the ngrams which occur within the specified _sentence_ and
+      # within the model.
+      #
+      def common_ngrams_from_sentence(sentence)
+        ngrams_from_sentence(sentence).select { |ngram| has_ngram?(ngram) }
+      end
+      #
+      # Returns the ngrams which occur within the specified _text_ and
+      # within the model.
+      #
+      def common_ngrams_from_text(text)
+        ngrams_from_text(text).select { |ngram| has_ngram?(ngram) }
+      end
+      #
+      # Returns the joint probability of the common ngrams between the
+      # specified _fragment_ and the model.
+      #
+      def fragment_commonality(fragment)
+        probability_of_ngrams(common_ngrams_from_fragment(fragment))
+      end
+      #
+      # Returns the joint probability of the common ngrams between the
+      # specified _sentence_ and the model.
+      #
+      def sentence_commonality(sentence)
+        probability_of_ngrams(common_ngrams_from_sentence(sentence))
+      end
+      #
+      # Returns the joint probability of the common ngrams between the
+      # specified _sentence_ and the model.
+      #
+      def text_commonality(text)
+        probability_of_ngrams(common_ngrams_from_text(text))
+      end
+    end
+  end
+end

data/lib/raingrams/helpers/frequency.rb ADDED Viewed

@@ -0,0 +1,43 @@
+module Raingrams
+  module Helpers
+    module Frequency
+      #
+      # Returns the observed frequency of the specified _ngram_ within
+      # the training text.
+      #
+      def frequency_of_ngram(ngram)
+        prefix = ngram.prefix
+        if @prefixes.has_key?(prefix)
+          return @prefixes[prefix].frequency_of(ngram.last)
+        else
+          return 0
+        end
+      end
+      #
+      # Returns the observed frequency of the specified _ngrams_ occurring
+      # within the training text.
+      #
+      def frequencies_for(ngrams)
+        table = {}
+        ngrams.each do |ngram|
+          table[ngram] = frequency_of_ngram(ngram)
+        end
+        return table
+      end
+      #
+      # Returns the total observed frequency of the specified _ngrams_
+      # occurring within the training text.
+      #
+      def frequency_of_ngrams(ngrams)
+        frequencies_for(ngrams).values.inject do |total,freq|
+          total + freq
+        end
+      end
+    end
+  end
+end

data/lib/raingrams/helpers/probability.rb ADDED Viewed

@@ -0,0 +1,67 @@
+module Raingrams
+  module Helpers
+    module Probability
+      #
+      # Returns the probability of the specified _ngram_ occurring within
+      # arbitrary text.
+      #
+      def probability_of_ngram(ngram)
+        prefix = ngram.prefix
+        if @prefixes.has_key?(prefix)
+          return @prefixes[prefix].probability_of(ngram.last)
+        else
+          return 0.0
+        end
+      end
+      #
+      # Returns the probability of the specified _ngrams_ occurring within
+      # arbitrary text.
+      #
+      def probabilities_for(ngrams)
+        table = {}
+        ngrams.each do |ngram|
+          table[ngram] = probability_of_ngram(ngram)
+        end
+        return table
+      end
+      #
+      # Returns the joint probability of the specified _ngrams_ occurring
+      # within arbitrary text.
+      #
+      def probability_of_ngrams(ngrams)
+        probabilities_for(ngrams).values.inject do |joint,prob|
+          joint * prob
+        end
+      end
+      #
+      # Returns the probability of the specified _fragment_ occuring within
+      # arbitrary text.
+      #
+      def fragment_probability(fragment)
+        probability_of_ngrams(ngrams_from_fragment(fragment))
+      end
+      #
+      # Returns the probability of the specified _sentence_ occuring within
+      # arbitrary text.
+      #
+      def sentence_probability(sentence)
+        probability_of_ngrams(ngrams_from_sentence(sentence))
+      end
+      #
+      # Returns the probability of the specified _text_ occuring within
+      # arbitrary text.
+      #
+      def text_probability(text)
+        probability_of_ngrams(ngrams_from_text(text))
+      end
+    end
+  end
+end

data/lib/raingrams/helpers/random.rb ADDED Viewed

@@ -0,0 +1,122 @@
+module Raingrams
+  module Helpers
+    module Random
+      #
+      # Returns a random gram from the model.
+      #
+      def random_gram
+        prefix = @prefixes.keys[rand(@prefixes.length)]
+        return prefix[rand(prefix.length)]
+      end
+      #
+      # Returns a random ngram from the model.
+      #
+      def random_ngram
+        prefix_index = rand(@prefixes.length)
+        prefix = @prefixes.keys[prefix_index]
+        table = @prefixes.values[prefix_index]
+        gram_index = rand(table.grams.length)
+        return (prefix + table.grams[gram_index])
+      end
+      #
+      # Returns a randomly generated sentence of grams using the given
+      # _options_.
+      #
+      def random_gram_sentence(options={})
+        grams = []
+        last_ngram = @starting_ngram
+        loop do
+          next_ngrams = ngrams_prefixed_by(last_ngram.postfix).to_a
+          last_ngram = next_ngrams[rand(next_ngrams.length)]
+          if last_ngram.nil?
+            return []
+          else
+            last_gram = last_ngram.last
+            break if last_gram == Tokens.stop
+            grams << last_gram
+          end
+        end
+        return grams
+      end
+      #
+      # Returns a randomly generated sentence of text using the given
+      # _options_.
+      #
+      def random_sentence(options={})
+        grams = random_gram_sentence(options)
+        sentence = grams.delete_if { |gram|
+          gram == Tokens.start || gram == Tokens.stop
+        }.join(' ')
+        if @ignore_case
+          sentence.capitalize!
+        end
+        if @ignore_punctuation
+          sentence << '.'
+        end
+        return sentence
+      end
+      #
+      # Returns a randomly generated paragraph of text using the given
+      # _options_.
+      #
+      # _options_ may contain the following keys:
+      # <tt>:min_sentences</tt>:: Minimum number of sentences in the
+      #                           paragraph. Defaults to 3.
+      # <tt>:max_sentences</tt>:: Maximum number of sentences in the
+      #                           paragraph. Defaults to 6.
+      #
+      def random_paragraph(options={})
+        min_sentences = (options[:min_sentences] || 3)
+        max_sentences = (options[:max_sentences] || 6)
+        sentences = []
+        (rand(max_sentences - min_sentences) + min_sentences).times do
+          sentences << random_sentence(options)
+        end
+        return sentences.join(' ')
+      end
+      #
+      # Returns randomly generated text using the given _options_.
+      #
+      # _options_ may contain the following keys:
+      # <tt>:min_sentences</tt>:: Minimum number of sentences in the
+      #                           paragraph. Defaults to 3.
+      # <tt>:max_sentences</tt>:: Maximum number of sentences in the
+      #                           paragraph. Defaults to 6.
+      # <tt>:min_paragraphs</tt>:: Minimum number of paragraphs in the text.
+      #                            Defaults to 3.
+      # <tt>:max_paragraphs</tt>:: Maximum number of paragraphs in the text.
+      #                            Defaults to 5.
+      #
+      def random_text(options={})
+        min_paragraphs = (options[:min_paragraphs] || 3)
+        max_paragraphs = (options[:max_paragraphs] || 6)
+        paragraphs = []
+        (rand(max_paragraphs - min_paragraphs) + min_paragraphs).times do
+          paragraphs << random_paragraph(options)
+        end
+        return paragraphs.join("\n\n")
+      end
+    end
+  end
+end

data/lib/raingrams/helpers/similarity.rb ADDED Viewed

@@ -0,0 +1,38 @@
+require 'raingrams/helpers/commonality'
+module Raingrams
+  module Helpers
+    module Similarity
+      def self.included(base)
+        base.module_eval { include Raingrams::Helpers::Commonality }
+      end
+      #
+      # Returns the conditional probability of the commonality of the
+      # specified _fragment_ against the _other_model_, given the
+      # commonality of the _fragment_ against the model.
+      #
+      def fragment_similarity(fragment,other_model)
+        other_model.fragment_commonality(fragment) / fragment_commonality(fragment)
+      end
+      #
+      # Returns the conditional probability of the commonality of the
+      # specified _sentence_ against the _other_model_, given the
+      # commonality of the _sentence_ against the model.
+      #
+      def sentence_similarity(sentence,other_model)
+        other_model.sentence_commonality(sentence) / sentence_commonality(sentence)
+      end
+      #
+      # Returns the conditional probability of the commonality of the
+      # specified _text_ against the _other_model_, given the commonality
+      # of the _text_ against the model.
+      #
+      def text_similarity(text,other_model)
+        other_model.text_commonality(text) / text_commonality(text)
+      end
+    end
+  end
+end

data/lib/raingrams/model.rb CHANGED Viewed

@@ -1,15 +1,22 @@
 require 'raingrams/ngram'
 require 'raingrams/ngram_set'
-require 'raingrams/probability_table'
 require 'raingrams/tokens'
+require 'raingrams/probability_table'
+require 'raingrams/helpers'
 require 'set'
-require 'hpricot'
+require 'nokogiri'
 require 'open-uri'
 module Raingrams
   class Model
+    include Helpers::Frequency
+    include Helpers::Probability
+    include Helpers::Similarity
+    include Helpers::Commonality
+    include Helpers::Random
     # Size of ngrams to use
     attr_reader :ngram_size
@@ -161,8 +168,12 @@ module Raingrams
         sentence.gsub!(/[\.\?!]*$/,'')
       end
+      if @ignore_case
+        # downcase the sentence
+        sentence.downcase!
+      end
       if @ignore_urls
-        # remove URLs
         sentence.gsub!(/\s*\w+:\/\/[\w\/\+_\-,:%\d\.\-\?&=]*\s*/,' ')
       end
@@ -176,11 +187,6 @@ module Raingrams
         sentence.gsub!(/\s*[\(\{\[]\d+[\)\}\]]\s*/,' ')
       end
-      if @ignore_case
-        # downcase the sentence
-        sentence.downcase!
-      end
       if @ignore_punctuation
         # split and ignore punctuation characters
         return sentence.scan(/\w+[\-_\.:']\w+|\w+/)
@@ -194,7 +200,13 @@ module Raingrams
     # Parses the specified _text_ and returns an Array of sentences.
     #
     def parse_text(text)
-      text.to_s.scan(/[^\s\.\?!][^\.\?!]*[\.\?\!]/)
+      text = text.to_s
+      if @ignore_urls
+        text.gsub!(/\s*\w+:\/\/[\w\/\+_\-,:%\d\.\-\?&=]*\s*/,' ')
+      end
+      return text.scan(/[^\s\.\?!][^\.\?!]*[\.\?\!]/)
     end
     #
@@ -460,38 +472,6 @@ module Raingrams
       return gram_set
     end
-    #
-    # Returns the ngrams which occur within the specified _words_ and
-    # within the model.
-    #
-    def common_ngrams_from_words(words)
-      ngrams_from_words(words).select { |ngram| has_ngram?(ngram) }
-    end
-    #
-    # Returns the ngrams which occur within the specified _fragment_ and
-    # within the model.
-    #
-    def common_ngrams_from_fragment(fragment)
-      ngrams_from_fragment(fragment).select { |ngram| has_ngram?(ngram) }
-    end
-    #
-    # Returns the ngrams which occur within the specified _sentence_ and
-    # within the model.
-    #
-    def common_ngrams_from_sentence(sentence)
-      ngrams_from_sentence(sentence).select { |ngram| has_ngram?(ngram) }
-    end
-    #
-    # Returns the ngrams which occur within the specified _text_ and
-    # within the model.
-    #
-    def common_ngrams_from_text(text)
-      ngrams_from_text(text).select { |ngram| has_ngram?(ngram) }
-    end
     #
     # Sets the frequency of the specified _ngram_ to the specified _value_.
     #
@@ -524,7 +504,7 @@ module Raingrams
     # Train the model with the specified _paragraphs_.
     #
     def train_with_paragraph(paragraph)
-      train_with_ngrams(ngrams_from_paragraph(paragraphs))
+      train_with_ngrams(ngrams_from_paragraph(paragraph))
     end
     #
@@ -546,274 +526,13 @@ module Raingrams
     # specified _url_.
     #
     def train_with_url(url)
-      doc = Hpricot(open(url))
+      doc = Nokogiri::HTML(open(url))
       return doc.search('p').map do |p|
         train_with_paragraph(p.inner_text)
       end
     end
-    #
-    # Returns the observed frequency of the specified _ngram_ within
-    # the training text.
-    #
-    def frequency_of_ngram(ngram)
-      prefix = ngram.prefix
-      if @prefixes.has_key?(prefix)
-        return @prefixes[prefix].frequency_of(ngram.last)
-      else
-        return 0
-      end
-    end
-    #
-    # Returns the probability of the specified _ngram_ occurring within
-    # arbitrary text.
-    #
-    def probability_of_ngram(ngram)
-      prefix = ngram.prefix
-      if @prefixes.has_key?(prefix)
-        return @prefixes[prefix].probability_of(ngram.last)
-      else
-        return 0.0
-      end
-    end
-    #
-    # Returns the observed frequency of the specified _ngrams_ occurring
-    # within the training text.
-    #
-    def frequencies_for(ngrams)
-      table = {}
-      ngrams.each do |ngram|
-        table[ngram] = frequency_of_ngram(ngram)
-      end
-      return table
-    end
-    #
-    # Returns the probability of the specified _ngrams_ occurring within
-    # arbitrary text.
-    #
-    def probabilities_for(ngrams)
-      table = {}
-      ngrams.each do |ngram|
-        table[ngram] = probability_of_ngram(ngram)
-      end
-      return table
-    end
-    #
-    # Returns the total observed frequency of the specified _ngrams_
-    # occurring within the training text.
-    #
-    def frequency_of_ngrams(ngrams)
-      frequencies_for(ngrams).values.inject do |total,freq|
-        total + freq
-      end
-    end
-    #
-    # Returns the joint probability of the specified _ngrams_ occurring
-    # within arbitrary text.
-    #
-    def probability_of_ngrams(ngrams)
-      probabilities_for(ngrams).values.inject do |joint,prob|
-        joint * prob
-      end
-    end
-    #
-    # Returns the probability of the specified _fragment_ occuring within
-    # arbitrary text.
-    #
-    def fragment_probability(fragment)
-      probability_of_ngrams(ngrams_from_fragment(fragment))
-    end
-    #
-    # Returns the probability of the specified _sentence_ occuring within
-    # arbitrary text.
-    #
-    def sentence_probability(sentence)
-      probability_of_ngrams(ngrams_from_sentence(sentence))
-    end
-    #
-    # Returns the probability of the specified _text_ occuring within
-    # arbitrary text.
-    #
-    def text_probability(text)
-      probability_of_ngrams(ngrams_from_text(text))
-    end
-    #
-    # Returns the joint probability of the common ngrams between the
-    # specified _fragment_ and the model.
-    #
-    def fragment_commonality(fragment)
-      probability_of_ngrams(common_ngrams_from_fragment(fragment))
-    end
-    #
-    # Returns the joint probability of the common ngrams between the
-    # specified _sentence_ and the model.
-    #
-    def sentence_commonality(sentence)
-      probability_of_ngrams(common_ngrams_from_sentence(sentence))
-    end
-    #
-    # Returns the joint probability of the common ngrams between the
-    # specified _sentence_ and the model.
-    #
-    def text_commonality(text)
-      probability_of_ngrams(common_ngrams_from_text(text))
-    end
-    #
-    # Returns the conditional probability of the commonality of the
-    # specified _fragment_ against the _other_model_, given the commonality
-    # of the _fragment_ against the model.
-    #
-    def fragment_similarity(fragment,other_model)
-      other_model.fragment_commonality(fragment) / fragment_commonality(fragment)
-    end
-    #
-    # Returns the conditional probability of the commonality of the
-    # specified _sentence_ against the _other_model_, given the commonality
-    # of the _sentence_ against the model.
-    #
-    def sentence_similarity(sentence,other_model)
-      other_model.sentence_commonality(sentence) / sentence_commonality(sentence)
-    end
-    #
-    # Returns the conditional probability of the commonality of the
-    # specified _text_ against the _other_model_, given the commonality
-    # of the _text_ against the model.
-    #
-    def text_similarity(text,other_model)
-      other_model.text_commonality(text) / text_commonality(text)
-    end
-    #
-    # Returns a random gram from the model.
-    #
-    def random_gram
-      prefix = @prefixes.keys[rand(@prefixes.length)]
-      return prefix[rand(prefix.length)]
-    end
-    #
-    # Returns a random ngram from the model.
-    #
-    def random_ngram
-      prefix_index = rand(@prefixes.length)
-      prefix = @prefixes.keys[prefix_index]
-      table = @prefixes.values[prefix_index]
-      gram_index = rand(table.grams.length)
-      return (prefix + table.grams[gram_index])
-    end
-    #
-    # Returns a randomly generated sentence of grams using the given
-    # _options_.
-    #
-    def random_gram_sentence(options={})
-      grams = []
-      last_ngram = @starting_ngram
-      loop do
-        next_ngrams = ngrams_prefixed_by(last_ngram.postfix).to_a
-        last_ngram = next_ngrams[rand(next_ngrams.length)]
-        if last_ngram.nil?
-          return []
-        else
-          last_gram = last_ngram.last
-          break if last_gram == Tokens.stop
-          grams << last_gram
-        end
-      end
-      return grams
-    end
-    #
-    # Returns a randomly generated sentence of text using the given
-    # _options_.
-    #
-    def random_sentence(options={})
-      grams = random_gram_sentence(options)
-      sentence = grams.delete_if { |gram|
-        gram == Tokens.start || gram == Tokens.stop
-      }.join(' ')
-      sentence << '.' if @ignore_punctuation
-      return sentence
-    end
-    #
-    # Returns a randomly generated paragraph of text using the given
-    # _options_.
-    #
-    # _options_ may contain the following keys:
-    # <tt>:min_sentences</tt>:: Minimum number of sentences in the
-    #                           paragraph. Defaults to 3.
-    # <tt>:max_sentences</tt>:: Maximum number of sentences in the
-    #                           paragraph. Defaults to 6.
-    #
-    def random_paragraph(options={})
-      min_sentences = (options[:min_sentences] || 3)
-      max_sentences = (options[:max_sentences] || 6)
-      sentences = []
-      (rand(max_sentences - min_sentences) + min_sentences).times do
-        sentences << random_sentence(options)
-      end
-      return sentences.join(' ')
-    end
-    #
-    # Returns randomly generated text using the given _options_.
-    #
-    # _options_ may contain the following keys:
-    # <tt>:min_sentences</tt>:: Minimum number of sentences in the
-    #                           paragraph. Defaults to 3.
-    # <tt>:max_sentences</tt>:: Maximum number of sentences in the
-    #                           paragraph. Defaults to 6.
-    # <tt>:min_paragraphs</tt>:: Minimum number of paragraphs in the text.
-    #                            Defaults to 3.
-    # <tt>:max_paragraphs</tt>:: Maximum number of paragraphs in the text.
-    #                            Defaults to 5.
-    #
-    def random_text(options={})
-      min_paragraphs = (options[:min_paragraphs] || 3)
-      max_paragraphs = (options[:max_paragraphs] || 6)
-      paragraphs = []
-      (rand(max_paragraphs - min_paragraphs) + min_paragraphs).times do
-        paragraphs << random_paragraph(options)
-      end
-      return paragraphs.join("\n\n")
-    end
     #
     # Refreshes the probability tables of the model.
     #
@@ -854,6 +573,13 @@ module Raingrams
       return self
     end
+    #
+    # Returns a Hash representation of the model.
+    #
+    def to_hash
+      @prefixes
+    end
     protected
     #

data/lib/raingrams/probability_table.rb CHANGED Viewed

@@ -141,6 +141,15 @@ module Raingrams
       return self
     end
+    #
+    # Returns a Hash representation of the probability table.
+    #
+    def to_hash
+      build
+      return @probabilities
+    end
     def inspect
       if @dirty
         "#<ProbabilityTable @total=#{@total} @frequencies=#{@frequencies.inspect}>"

data/lib/raingrams/tokens/tokens.rb ADDED Viewed

@@ -0,0 +1,35 @@
+require 'raingrams/tokens/start_sentence'
+require 'raingrams/tokens/stop_sentence'
+require 'raingrams/tokens/unknown'
+module Raingrams
+  module Tokens
+    #
+    # Returns all defined tokens.
+    #
+    def Tokens.all
+      @@raingram_tokens ||= {}
+    end
+    #
+    # Returns the start sentence token.
+    #
+    def Tokens.start
+      Tokens.all[:start] ||= StartSentence.new
+    end
+    #
+    # Returns the stop sentence token.
+    #
+    def Tokens.stop
+      Tokens.all[:stop] ||= StopSentence.new
+    end
+    #
+    # Returns the unknown word token.
+    #
+    def Tokens.unknown
+      Tokens.all[:unknown] ||= Unknown.new
+    end
+  end
+end

data/lib/raingrams/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Raingrams
-  VERSION = '0.1.1'
+  VERSION = '0.1.2'
 end

data/tasks/spec.rb CHANGED Viewed

@@ -5,3 +5,5 @@ Spec::Rake::SpecTask.new(:spec) do |t|
   t.libs += ['lib', 'spec']
   t.spec_opts = ['--colour', '--format', 'specdoc']
 end
+task :default => :spec

metadata CHANGED Viewed

@@ -1,26 +1,26 @@
 --- !ruby/object:Gem::Specification
 name: raingrams
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.1.2
 platform: ruby
 authors:
-- Postmodern Modulus III
+- Postmodern
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2008-10-12 00:00:00 -07:00
+date: 2009-04-23 00:00:00 -07:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency
-  name: hpricot
+  name: nokogiri
   type: :runtime
   version_requirement:
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: "0"
+        version: 1.2.0
     version:
 - !ruby/object:Gem::Dependency
   name: hoe
@@ -30,7 +30,7 @@ dependencies:
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 1.8.0
+        version: 1.12.2
     version:
 description: Raingrams is a flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.
 email:
@@ -45,7 +45,6 @@ extra_rdoc_files:
 - Manifest.txt
 - README.txt
 - TODO.txt
-- spec/training/snowcrash.txt
 files:
 - History.txt
 - LICENSE.txt
@@ -54,27 +53,33 @@ files:
 - TODO.txt
 - Rakefile
 - lib/raingrams.rb
-- lib/raingrams/version.rb
-- lib/raingrams/raingrams.rb
-- lib/raingrams/exceptions/prefix_frequency_missing.rb
 - lib/raingrams/exceptions.rb
+- lib/raingrams/exceptions/prefix_frequency_missing.rb
+- lib/raingrams/extensions.rb
 - lib/raingrams/extensions/object.rb
 - lib/raingrams/extensions/string.rb
-- lib/raingrams/extensions.rb
+- lib/raingrams/tokens.rb
 - lib/raingrams/tokens/token.rb
 - lib/raingrams/tokens/start_sentence.rb
 - lib/raingrams/tokens/stop_sentence.rb
 - lib/raingrams/tokens/unknown.rb
-- lib/raingrams/tokens.rb
+- lib/raingrams/tokens/tokens.rb
 - lib/raingrams/ngram.rb
 - lib/raingrams/ngram_set.rb
 - lib/raingrams/probability_table.rb
+- lib/raingrams/helpers.rb
+- lib/raingrams/helpers/frequency.rb
+- lib/raingrams/helpers/probability.rb
+- lib/raingrams/helpers/similarity.rb
+- lib/raingrams/helpers/commonality.rb
+- lib/raingrams/helpers/random.rb
 - lib/raingrams/model.rb
 - lib/raingrams/bigram_model.rb
 - lib/raingrams/trigram_model.rb
 - lib/raingrams/quadgram_model.rb
 - lib/raingrams/pentagram_model.rb
 - lib/raingrams/hexagram_model.rb
+- lib/raingrams/open_vocabulary.rb
 - lib/raingrams/open_vocabulary/open_model.rb
 - lib/raingrams/open_vocabulary/model.rb
 - lib/raingrams/open_vocabulary/bigram_model.rb
@@ -82,7 +87,8 @@ files:
 - lib/raingrams/open_vocabulary/quadgram_model.rb
 - lib/raingrams/open_vocabulary/pentagram_model.rb
 - lib/raingrams/open_vocabulary/hexagram_model.rb
-- lib/raingrams/open_vocabulary.rb
+- lib/raingrams/version.rb
+- lib/raingrams/raingrams.rb
 - tasks/spec.rb
 - spec/training/snowcrash.txt
 - spec/helpers/training.rb
@@ -121,7 +127,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
 requirements: []
 rubyforge_project: raingrams
-rubygems_version: 1.3.0
+rubygems_version: 1.3.1
 signing_key:
 specification_version: 2
 summary: Raingrams is a flexible and general-purpose ngrams library written in Ruby