RubyGems - literate_randomizer - Versions diffs - 0.3.1 → 0.4.0 - Mend

literate_randomizer 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

data/README.md +19 -17
data/lib/literate_randomizer.rb +31 -6
data/lib/literate_randomizer/markov.rb +58 -162
data/lib/literate_randomizer/randomizer.rb +155 -0
data/lib/literate_randomizer/source_parser.rb +55 -0
data/lib/literate_randomizer/util.rb +26 -0
data/lib/literate_randomizer/version.rb +2 -1
data/spec/literate_randomizer_spec.rb +47 -31
metadata +6 -2

data/README.md CHANGED

@@ -16,24 +16,22 @@ Or install it yourself as:
     $ gem install literate_randomizer
-## Usage
+## Basic Usage (global instance)
-Example:
+The simplest way to use LiterateRandomizer is the global Randommizer instance: LiterateRandomizer.global. Any method you invoke on LiterateRandomizer gets forwarded to this instance. Examples:
     require 'literate_randomizer'
-    lr = LiterateRandomizer.create
+    LiterateRandomizer.word
+    # => "frivolous"
-    lr.word
-    # => "frivolous"
+    LiterateRandomizer.sentence
+    # => "Muscular arms round opening of sorts while Lord John Roxton."
-    lr.sentence
-    # => "Muscular arms round opening of sorts while Lord John Roxton."
+    LiterateRandomizer.paragraph
+    # => "Fulmination against the wandering that the woes of this. Particular package of the back to matchwood. File with hideous jaws of Southampton. Adventure and he. Skewered on to pledge."
-    lr.paragraph
-    # => "Fulmination against the wandering that the woes of this. Particular package of the back to matchwood. File with hideous jaws of Southampton. Adventure and he. Skewered on to pledge."
-    puts lr.paragraphs
+    puts LiterateRandomizer.paragraphs
 The last line outputs:
@@ -51,7 +49,7 @@ When creating a randomizer, there are a few options. The source_material should
       :source_material => string OR
       :source_material_file => filename
       :randomizer => Random.new(seed=0)
-      :punctuation_distribution => DEFAULT_PUNCTUATION_DISTRIBUTION - punctiation is randomly selected from this array
+      :punctuation_distribution => DEFAULT_PUNCTUATION_DISTRIBUTION - punctuation is randomly selected from this array
 **paragraph** options:
@@ -59,7 +57,7 @@ When creating a randomizer, there are a few options. The source_material should
       :first_word => nil - the start word
       :words => range or int - number of words in sentence
       :sentences => range or int - number of sentences in paragraph
-      :punctuation => nil - punction to end the sentence with (nil == randomly selected from punctuation_distribution)
+      :punctuation => nil - punctuation to end the sentence with (nil == randomly selected from punctuation_distribution)
 **paragraphs** options:
@@ -67,14 +65,14 @@ When creating a randomizer, there are a few options. The source_material should
       :first_word => nil - the first word of the paragraph
       :words => range or int - number of words in sentence
       :sentences => range or int - number of sentences in paragraph
-      :punctuation => nil - punction to end the paragraph with (nil == randomly selected from punctuation_distribution)
+      :punctuation => nil - punctuation to end the paragraph with (nil == randomly selected from punctuation_distribution)
       :paragraphs => range or int - number of paragraphs in paragraph
       :join => "\n\n" - join the paragraphs. if :join => false, returns an array of the paragraphs
 Advanced example:
-    lr.paragraph :sentences => 5, :words => 3..8, :first_word => "A", :punctuation => "!!!"
-    # => "A dense mob of our. Gods on that Challenger. Invariably to safety though. Weaponless but it my! Some bandy-legged lurching creature!!!"
+    LiterateRandomizer.paragraph :sentences => 5, :words => 3..8, :first_word => "A", :punctuation => "!!!"
+    # => "A dense mob of our. Gods on that Challenger. Invariably to safety though. Weaponless but it my! Some bandy-legged lurching creature!!!"
 If you just want to use a single, global instance, you can initialize and access it this way:
@@ -84,12 +82,16 @@ If you just want to use a single, global instance, you can initialize and access
     # after the first call, options are ignored and the existing randomizer is returned
     LiterateRandomizer.global.sentence
-    # => "Muscular arms round opening of sorts while Lord John Roxton."
+    # => "Muscular arms round opening of sorts while Lord John Roxton."
     # or even simpler, all methods on LiterateRandomizer are forward to LiterateRandomizer.global:
     LiterateRandomizer.paragraph(:sentences => 3, :words => 3)
     # => "Drama which would. Wrong fashion which. Throw them there."
+## Inspiration
+Thanks to Tim Riley for getting me started on the right track with this <a href="http://openmonkey.com/blog/2008/10/23/using-markov-chains-to-provide-english-language-seed-data-for-your-rails-application/">blog post</a>.
 ## Contributing
 1. Fork it

data/lib/literate_randomizer.rb CHANGED

@@ -1,22 +1,47 @@
-%w{version markov}.each do |file|
-  require File.join(File.dirname(__FILE__),"literate_randomizer", file)
-end
+%w{
+  version
+  util
+  source_parser
+  markov
+  randomizer
+}.each {|file|require File.join(File.dirname(__FILE__),"literate_randomizer", file)}
 module LiterateRandomizer
   class << self
+    # Create a new Randomizer instance
+    #
+    # See LiterateRandomizer::Randomizer#initializer for options.
     def create(options={})
-      MarkovChain.new options
+      Randomizer.new options
     end
-    def global(options={})
-      @global_instance ||= MarkovChain.new options
+    # Access or initialize the global randomizer instance.
+    #
+    # The first time this is called, the global instance is created and initialized. Subsequent calls with no parameters just return
+    # the global instance. If LiterateRandomize.global is called again with options, a new global instance is created.
+    #
+    # See LiterateRandomizer::Randomizer#initializer for options.
+    def global(options=nil)
+      return @global_instance if @global_instance && !options
+      @global_instance ||= Randomizer.new(options||{})
     end
+    # Forwards method invocations to the global Randomizer instance. Unless you need more than one instance of Randomizer,
+    # this is the easiest way to use LiterateRandomizer.
+    #
+    # Examples:
+    #
+    # * LiterateRandomizer.word
+    # * LiterateRandomizer.sentence
+    # * LiterateRandomizer.paragraph
+    # * LiterateRandomizer.paragraphs
     def method_missing(method, *arguments, &block)
       global.send(method, *arguments, &block)
     end
+    # correctly mirrors method_missing
     def respond_to?(method)
       super || global.respond_to?(method)
     end

data/lib/literate_randomizer/markov.rb CHANGED

@@ -4,211 +4,107 @@
 # by Shane Brinkman-Davis
 module LiterateRandomizer
-class MarkovChain
-  DEFAULT_PUNCTUATION_DISTRIBUTION = %w{. . . . . . . . . . . . . . . . ? !}
-  PREPOSITION_REGEX = /^(had|the|to|or|and|a|in|that|it|if|of|is|was|for|on|as|an|your|our|my|per|until)$/
-  attr_accessor :randomizer, :init_options, :punctuation_distribution
-  attr_reader :markov_words, :words, :first_words
-  def default_source_material
-    File.expand_path File.join(File.dirname(__FILE__),"..","..","data","the_lost_world_by_arthur_conan_doyle.txt")
-  end
-  # options:
-  #     :source_material => string
-  #     :source_material_file => filename
-  def source_material(options=init_options)
-    options[:source_material] || File.read(options[:source_material_file] || default_source_material)
-  end
+# The Markov-Chain bi-gram model. Primary purpose is, given a word, return the next word that is "likely" based on the source material.
+class MarkovModel
+  # The source of all random values. Must implement: #rand(limit)
+  #
+  # Default: Random.new()
+  attr_accessor :randomizer
-  def chain_add(word, next_word)
-    markov_words[word] ||= Hash.new(0)
-    markov_words[word][next_word] += 1
-  end
+  # A hash (string => true) of all unique words found in the source-material.
+  attr_reader :words
-  # remove any non-alpha characters from word
-  def scrub_word(word)
-    word &&= word[/[A-Za-z][A-Za-z'-]*/]
-    word &&= word[/[A-Za-z'-]*[A-Za-z]/]
-    (word && word.strip) || ""
-  end
+  # An array of all words that appear at the beginning of sentences in the source-material.
+  attr_reader :first_words
-  def scrub_word_list(word_list)
-    word_list.split(/[\s]+/).collect {|a| scrub_word(a)}.select {|a| a.length>0}
-  end
+  # Data structure incoding all Markov-Chains (bi-grams) found in the source-material.
+  #
+  # markov_chains is a hash of hashs. The top level keys are the "first words" in the chain.
+  # For each first-word, there are one or more words that followed that word in the text. Second-words are the second-level hash key.
+  # The second-level hash values are the count of the number of times that second word followed the first.
+  #
+  # Summary: {first_words => {second_words => found-in-source-material-in-sequence-count}}
+  attr_reader :markov_chains
-  def capitalize(word)
-    word.chars.first.upcase+word[1..-1]
-  end
+  # an instance of SourceParser attached to the source_material
+  attr_accessor :source_parser
+  private
+  # cached copy of the options passed in on initialization
+  attr_accessor :init_options
-  def source_sentences
-    source_material.split(/([.?!"]\s|--| ')+/)
+  # add a word/next_word pair to @markov_chains
+  def chain_add(word, next_word)
+    markov_chains[word] ||= Hash.new(0)
+    markov_chains[word][next_word] += 1
   end
   # remove all dead-end words
   def prune_markov_words
-    @markov_words.keys.each do |key|
-      @markov_key.delete(key) if @markov_words[key].length == 0
+    @markov_chains.keys.each do |key|
+      @markov_key.delete(key) if @markov_chains[key].length == 0
     end
   end
-  def populate_markov_words
-    @markov_words = {}
+  # populate the @markov_chains hash
+  def populate_markov_chains
+    @markov_chains = {}
     @words = {}
     @first_words = {}
-    source_sentences.each do |sentence|
-      word_list = scrub_word_list sentence
+    source_parser.each_sentence do |word_list|
+      next unless word_list.length >= 2
       @first_words[word_list[0]] = true
       word_list.each_with_index do |word, index|
         @words[word] = true
         next_word = word_list[index+1]
         chain_add word, next_word if next_word
       end
-    end
-    prune_markov_words
+    end
+    prune_markov_words
   end
+  # populate the weight-sums for each chain
+  # (an optimization)
   def populate_markov_sum
     @markov_weighted_sum = {}
-    @markov_words.each do |word,followers|
+    @markov_chains.each do |word,followers|
       @markov_weighted_sum[word] = followers.inject(0) {|sum,kv| sum + kv[1]}
     end
   end
+  # Populate internal data-structures in preparation for #next_word
   def populate
-    populate_markov_words
+    populate_markov_chains
     populate_markov_sum
   end
-  def max(r)
-    return r if r.kind_of? Integer
-    r.max
-  end
-  def rand_count(r)
-    return r if r.kind_of? Integer
-    rand(r.max-r.min)+r.min
-  end
-  # options:
-  #     :source_material => string OR
-  #     :source_material_file => filename
-  #     :randomizer - responds to .rand(limit) - this primarilly exists for testing
-  #     :punctuation_distribution => DEFAULT_PUNCTUATION_DISTRIBUTION - punctiation is randomly selected from this array
+  public
+  # Initialize a new instance.
+  #
+  # Options:
+  #
+  # * :randomizer => Random.new # must respond to #rand(limit)
+  # * :source_parser => SourceParser.new options
   def initialize(options={})
-    @init_options = options
-    @randomizer = randomizer || Random.new()
-    @punctuation_distribution = options[:punctuation_distribution] || DEFAULT_PUNCTUATION_DISTRIBUTION
+    @randomizer = randomizer || Random.new
+    @source_parser = options[:source_parser] || SourceParser.new(options)
     populate
   end
-  def inspect
-    "#<#{self.class}: #{@words.length} words, #{@markov_words.length} word-chains, #{@first_words.length} first_words>"
-  end
-  def next_word(word)
-    return if !markov_words[word]
+  # Given a word, return a weighted-randomly selected next-one.
+  def next_word(word,randomizer=@randomizer)
+    return if !markov_chains[word]
     sum = @markov_weighted_sum[word]
-    random = rand(sum)+1
+    random = randomizer.rand(sum)+1
     partial_sum = 0
-    (markov_words[word].find do |w, count|
+    (markov_chains[word].find do |w, count|
       partial_sum += count
       w!=word && partial_sum >= random
     end||[]).first
   end
-  def rand(limit=nil)
-    @randomizer.rand(limit)
-  end
-  # return a random word
-  def word
-    @cached_word_keys ||= words.keys
-    @cached_word_keys[rand(@cached_word_keys.length)]
-  end
-  # return a random first word of a sentence
-  def first_word
-    @cached_first_word_keys ||= first_words.keys
-    @cached_first_word_keys[rand(@cached_first_word_keys.length)]
-  end
-  # return a random first word of a sentence
-  def markov_word
-    @cached_markov_word_keys ||= markov_words.keys
-    @cached_markov_word_keys[rand(@cached_markov_word_keys.length)]
-  end
-  def punctuation
-    @punctuation_distribution[rand(@punctuation_distribution.length)]
-  end
-  def extend_trailing_preposition(max_words,words)
-    while words.length < max_words && words[-1] && words[-1][PREPOSITION_REGEX]
-      words << next_word(words[-1])
-    end
-    words
-  end
-  # return a random sentence
-  # options:
-  #   * :first_word => nil - the start word
-  #   * :words => range or int - number of words in sentence
-  #   * :punctuation => nil - punction to end the sentence with (nil == randomly selected from punctuation_distribution)
-  def sentence(options={})
-    word = options[:first_word] || self.markov_word
-    num_words_option = options[:words] || (3..15)
-    count = rand_count num_words_option
-    punctuation = options[:punctuation] || self.punctuation
-    words = count.times.collect do
-      word.tap {word = next_word(word)}
-    end.compact
-    words = extend_trailing_preposition(max(num_words_option), words)
-    capitalize words.compact.join(" ") + punctuation
-  end
-  # return a random paragraph
-  # options:
-  #   * :first_word => nil - the first word of the paragraph
-  #   * :words => range or int - number of words in sentence
-  #   * :sentences => range or int - number of sentences in paragraph
-  #   * :punctuation => nil - punction to end the paragraph with (nil == randomly selected from punctuation_distribution)
-  def paragraph(options={})
-    count = rand_count options[:sentences] || (5..15)
-    count.times.collect do |i|
-      op = options.clone
-      op.delete :punctuation unless i==count-1
-      op.delete :first_word unless i==0
-      sentence op
-    end.join(" ")
-  end
-  # return random paragraphs
-  # options:
-  #   * :first_word => nil - the first word of the paragraph
-  #   * :words => range or int - number of words in sentence
-  #   * :sentences => range or int - number of sentences in paragraph
-  #   * :paragraphs => range or int - number of paragraphs in paragraph
-  #   * :join => "\n\n" - join the paragraphs. if :join => false, returns an array of the paragraphs
-  #   * :punctuation => nil - punction to end the paragraph with (nil == randomly selected from punctuation_distribution)
-  def paragraphs(options={})
-    count = rand_count options[:paragraphs] || (3..5)
-    join_str = options[:join]
-    res = count.times.collect do |i|
-      op = options.clone
-      op.delete :punctuation unless i==count-1
-      op.delete :first_word unless i==0
-      paragraph op
-    end
-    join_str!=false ? res.join(join_str || "\n\n") : res
-  end
 end
-end
+end

data/lib/literate_randomizer/randomizer.rb ADDED

@@ -0,0 +1,155 @@
+module LiterateRandomizer
+# The main class. Each instance has its own random number generator and can work against its own training source-material.
+class Randomizer
+  # The default punctuation distribution. Punctuation is pulled randomly from this array. It can contain any string.
+  DEFAULT_PUNCTUATION_DISTRIBUTION = %w{. . . . . . . . . . . . . . . . ? !}
+  # LiterateRandomizer prefers to not end sentences with words that match the following regexp:
+  PREPOSITION_REGEX = /^(had|the|to|or|and|a|in|that|it|if|of|is|was|for|on|as|an|your|our|my|per|until)$/
+  # The source of all random values. Must implement: #rand(limit)
+  #
+  # Default: Random.new()
+  attr_accessor :randomizer
+  # To end setences, one of the strings in this array is selected at random (uniform-distribution)
+  #
+  # Default: DEFAULT_PUNCTUATION_DISTRIBUTION
+  attr_accessor :punctuation_distribution
+  # an instance of SourceParser attached to the source_material
+  attr_reader :source_parser
+  # The random-generator model
+  attr_reader :model
+  private
+  # Check to see if the sentence ends in a PREPOSITION_REGEX word.
+  # If so, add more words up to max-words until it does.
+  def extend_trailing_preposition(max_words,words)
+    while words.length < max_words && words[-1] && words[-1][PREPOSITION_REGEX]
+      words << model.next_word(words[-1],randomizer)
+    end
+    words
+  end
+  public
+  # Initialize a new instance. Each Markov randomizer instance can run against its own source_material.
+  #
+  # Options:
+  #
+  # * :source_material => string OR
+  # * :source_material_file => filename
+  # * :punctuation_distribution => DEFAULT_PUNCTUATION_DISTRIBUTION
+  #   punctiation is randomly selected from this array
+  #
+  # Advanced options: (primiarilly for testing)
+  #
+  # * :randomizer => Random.new # must respond to #rand(limit)
+  # * :source_parser => SourceParser.new options
+  # * :model => MarkovModel.new :source_parser => source_parser
+  def initialize(options={})
+    @init_options = options
+    @randomizer = randomizer || Random.new
+    @punctuation_distribution = options[:punctuation_distribution] || DEFAULT_PUNCTUATION_DISTRIBUTION
+    @source_parser = options[:source_parser] || SourceParser.new(options)
+    @model = options[:model] || MarkovModel.new(:source_parser => source_parser)
+  end
+  # Returns a quick summary of the instance.
+  def inspect
+    "#<#{self.class}: #{model.words.length} words, #{model.markov_chains.length} word-chains, #{model.first_words.length} first_words>"
+  end
+  # return a random word
+  def word
+    @cached_word_keys ||= model.words.keys
+    @cached_word_keys[rand(@cached_word_keys.length)]
+  end
+  # return a random first word of a sentence
+  def first_word
+    @cached_first_word_keys ||= model.first_words.keys
+    @cached_first_word_keys[rand(@cached_first_word_keys.length)]
+  end
+  # return a random number generated by randomizer
+  def rand(limit=nil)
+    @randomizer.rand(limit)
+  end
+  # return a random end-sentence string from punctuation_distribution
+  def punctuation
+    @punctuation_distribution[rand(@punctuation_distribution.length)]
+  end
+  # return a random sentence
+  #
+  # Options:
+  #
+  # * :first_word => nil - the start word
+  # * :words => range or int - number of words in sentence
+  # * :punctuation => nil - punction to end the sentence with (nil == randomly selected from punctuation_distribution)
+  def sentence(options={})
+    word = options[:first_word] || self.first_word
+    num_words_option = options[:words] || (3..15)
+    count = Util.rand_count(num_words_option,randomizer)
+    punctuation = options[:punctuation] || self.punctuation
+    words = count.times.collect do
+      word.tap {word = model.next_word(word,randomizer)}
+    end.compact
+    words = extend_trailing_preposition(Util.max(num_words_option), words)
+    Util.capitalize words.compact.join(" ") + punctuation
+  end
+  # return a random paragraph
+  #
+  # Options:
+  #
+  # * :first_word => nil - the first word of the paragraph
+  # * :words => range or int - number of words in sentence
+  # * :sentences => range or int - number of sentences in paragraph
+  # * :punctuation => nil - punction to end the paragraph with (nil == randomly selected from punctuation_distribution)
+  def paragraph(options={})
+    count = Util.rand_count(options[:sentences] || (5..15),randomizer)
+    count.times.collect do |i|
+      op = options.clone
+      op.delete :punctuation unless i==count-1
+      op.delete :first_word unless i==0
+      sentence op
+    end.join(" ")
+  end
+  # return random paragraphs
+  #
+  # Options:
+  #
+  # * :first_word => nil - the first word of the paragraph
+  # * :words => range or int - number of words in sentence
+  # * :sentences => range or int - number of sentences in paragraph
+  # * :paragraphs => range or int - number of paragraphs in paragraph
+  # * :join => "\n\n" - join the paragraphs. if :join => false, returns an array of the paragraphs
+  # * :punctuation => nil - punction to end the paragraph with (nil == randomly selected from punctuation_distribution)
+  def paragraphs(options={})
+    count = Util.rand_count(options[:paragraphs] || (3..5),randomizer)
+    join_str = options[:join]
+    res = count.times.collect do |i|
+      op = options.clone
+      op.delete :punctuation unless i==count-1
+      op.delete :first_word unless i==0
+      paragraph op
+    end
+    join_str!=false ? res.join(join_str || "\n\n") : res
+  end
+end
+end

data/lib/literate_randomizer/source_parser.rb ADDED

@@ -0,0 +1,55 @@
+module LiterateRandomizer
+# Parse the source material and provide "each_sentence" - an easy way to walk the source material.
+class SourceParser
+  private
+  attr_reader :init_options
+  public
+  # Options:
+  #
+  # * :source_material => string OR
+  # * :source_material_file => filename
+  def initialize(options)
+    @init_options = options
+  end
+  # read the default source material included with the gem
+  def default_source_material
+    File.expand_path File.join(File.dirname(__FILE__),"..","..","data","the_lost_world_by_arthur_conan_doyle.txt")
+  end
+  # Options:
+  #
+  #     :source_material => string
+  #     :source_material_file => filename
+  def source_material(options=init_options)
+    options[:source_material] || File.read(options[:source_material_file] || default_source_material)
+  end
+  # Read the source material and split it into sentences
+  # NOTE: this re-reads the source material each time. Usually this only needs to happen once and it would waste memory to keep it around.
+  def source_sentences
+    source_material.split(/([.?!"]($|\s)|\n\s*\n)+/)
+  end
+  # remove any non-alpha characters from word
+  def scrub_word(word)
+    word &&= word[/[A-Za-z][A-Za-z'-]*/]
+    word &&= word[/[A-Za-z'-]*[A-Za-z]/]
+    (word && word.strip) || ""
+  end
+  # clean up all words in  a string, returning an array of clean words
+  def scrub_sentence(sentence)
+    sentence.split(/([\s]|--)+/).collect {|a| scrub_word(a)}.select {|a| a.length>0}
+  end
+  # Yields to a block each sentence as an array of words
+  def each_sentence
+    source_sentences.each do |sentence|
+      yield scrub_sentence sentence
+    end
+  end
+end
+end

data/lib/literate_randomizer/util.rb ADDED

@@ -0,0 +1,26 @@
+module LiterateRandomizer
+# A few utility methods
+class Util
+  class << self
+    # r can be an Integer of a Range. If an intenger, return r, else, return a the maximum value in the range.
+    def max(r)
+      return r if r.kind_of? Integer
+      r.max
+    end
+    # r can be an Integer of a Range. If an intenger, return r, else, return a random number within the range.
+    def rand_count(r,randomizer=Random.new)
+      return r if r.kind_of? Integer
+      randomizer.rand(r.max-r.min)+r.min
+    end
+    # return word with the first letter capitalized
+    def capitalize(word)
+      word.chars.first.upcase+word[1..-1]
+    end
+  end
+end
+end

data/lib/literate_randomizer/version.rb CHANGED

@@ -1,3 +1,4 @@
 module LiterateRandomizer
-  VERSION = "0.3.1"
+  # the current gem-version
+  VERSION = "0.4.0"
 end

data/spec/literate_randomizer_spec.rb CHANGED

@@ -2,6 +2,13 @@ require File.join(File.dirname(__FILE__),"..","lib","literate_randomizer")
 describe LiterateRandomizer do
+  WORD="[a-zA-Z]+([-'][a-zA-Z]+)*"
+  CWORD="[A-Z][a-zA-Z]*([-'][a-zA-Z]+)*"
+  PUNCTUATION="[!.?]"
+  SENTENCE_TAIL = "( #{WORD})*#{PUNCTUATION}"
+  SENTENCE="#{CWORD}#{SENTENCE_TAIL}"
+  SENTENCES="#{SENTENCE}( #{SENTENCE})+"
   def new_lr(options={})
     $lr ||= LiterateRandomizer.create options
     $lr.randomizer = Random.new(1)
@@ -18,74 +25,83 @@ describe LiterateRandomizer do
   end
   it "words.length should be the number of words in the file" do
-    new_lr.words.length.should == 9143
+    new_lr.model.words.length.should == 9117
   end
   it "first_words.length should be the number words starting sentences in the file" do
-    new_lr.first_words.length.should == 754
-  end
-  it "source_sentences.length should be the number of sentences in the file" do
-    new_lr.source_sentences.length.should == 10699
-    new_lr.source_sentences.length.should > new_lr.first_word.length
+    new_lr.model.first_words.length.should == 585
   end
   it "word should return a random word" do
-    new_lr.word.should == "own"
+    new_lr.word.should match /[a-z]+/
   end
   it "sentence should return a random sentence" do
-    new_lr.sentence.should == "Bad form of my own chances are a riding-whip."
+    new_lr.sentence.should match /^#{SENTENCE}$/
+  end
+  it "if we keep resetting the randomizer we should keep getting the same sentence" do
+    s = new_lr.sentence
+    10.times do
+      new_lr.sentence.should == s
+    end
   end
   it "sentence length should work" do
-    new_lr.sentence(:words => 1).should == "Bad."
-    new_lr.sentence(:words => 3).should == "Bad money if."
-    new_lr.sentence(:words => 5).should == "Bad money if ever come."
-    new_lr.sentence(:words => 7).should == "Bad money if ever come outwards at."
-    new_lr.sentence(:words => 9).should == "Bad money if ever come outwards at the side."
-    new_lr.sentence(:words => 2..7).should == "Bad job for a final credit."
+    new_lr.sentence(:words => 1).split(' ').length.should == 1
+    new_lr.sentence(:words => 2).split(' ').length.should == 2
+    new_lr.sentence(:words => 3).split(' ').length.should == 3
+    new_lr.sentence(:words => 9).split(' ').length.should == 9
+    a = new_lr.sentence(:words => 2..7).split(' ')
+    a.length.should >= 2
+    a.length.should <= 7
   end
   it "successive calls should vary" do
     lr = new_lr
-    lr.sentence.should == "Bad form of my own chances are a riding-whip."
-    lr.sentence.should == "Hit you chaps think of battle Our young fellah when in Streatham."
-    lr.sentence.should == "Upward curves which should be through the whole tribe."
+    a,b,c = lr.sentence,lr.sentence,lr.sentence
+    a.should_not == b
+    b.should_not == c
+    c.should_not == a
   end
   it "paragraph should work" do
-    new_lr.paragraph.should == "Bad form of my own chances are a riding-whip. Hit you chaps think of battle Our young fellah when in Streatham. Upward curves which should be through the whole tribe. Mend it at Edinburgh rose and it in diameter. Placed behind him. Rubbing his elephant-gun and sloth which way up to Project Gutenberg is going. Columns until he came at a last. Elusive enemies while beneath the main river up in it because on a boiling. Burying its coloring that skull and that there was able. Eventful moment of my clothes were to visit."
+    new_lr.paragraph.should match /([A-Z][a-zA-Z ]+[.!?])+/
   end
-  it "first_word should work" do
-    new_lr.paragraph(:sentences => 5, :words=>3).should == "Bad money if. Discreetly vague way. Melee in that. Hopin that dreadful. Executive and hold."
-    new_lr.paragraph(:sentences => 2..4, :words=>3).should == "Bad money if. Discreetly vague way. Melee in that."
+  it "paragraph parameters should work" do
+    new_lr.paragraph(:sentences => 5, :words=>3).should match /^(#{CWORD} #{WORD} #{WORD}[.!?] ?){5,5}$/
+    new_lr.paragraph(:sentences => 2..4, :words=>3).should match /(#{CWORD} #{WORD} #{WORD}[.!?] ?){2,4}/
   end
   it "first_word should work" do
-    new_lr.paragraph(:first_word => "A",:sentences => 5, :words=>3).should == "A roaring rumbling. Instanced a journalist. Eight after to-morrow. Hopin that dreadful. Executive and hold."
+    new_lr.paragraph(:first_word => "A",:sentences => 5, :words=>3).should match /^A#{SENTENCE_TAIL} #{SENTENCES}$/
   end
   it "punctuation should work" do
-    new_lr.paragraph(:punctuation => "!!!",:sentences => 5, :words=>3).should == "Bad money if. Discreetly vague way. Melee in that. Hopin that dreadful. Executive and hold!!!"
+    new_lr.paragraph(:punctuation => "!!!",:sentences => 5, :words=>3).should match /^(#{CWORD} #{WORD} #{WORD}[.!?] ?){4,4}#{CWORD} #{WORD} #{WORD}!!!$/
   end
   it "global_randomizer_should work" do
-    LiterateRandomizer.global.class.should == LiterateRandomizer::MarkovChain
+    LiterateRandomizer.global.class.should == LiterateRandomizer::Randomizer
   end
   it "global_randomizer_should forwarding should work" do
     LiterateRandomizer.respond_to?(:paragraph).should == true
     LiterateRandomizer.respond_to?(:fonsfoaihdsfa).should == false
-    LiterateRandomizer.word.should == "own"
-    LiterateRandomizer.sentence.should == "Beak filled in the side of Vertebrate Evolution and up into private."
-    LiterateRandomizer.paragraph.should == "GUTENBERG-tm concept of their rat-trap grip upon Challenger of the carrying of selfishness! Telling you with great enterprise upon their own eventual goal and it in a liar. The complete your consent. Reporters down at a tangle of the huge flippers behind us in writing. Chandeliers in those huge wings of this agreement and Ipetu. Taken the gray eyes were general laws of what. Variety of photographs said for the words?"
+    LiterateRandomizer.word.should match /^#{WORD}$/
+    LiterateRandomizer.sentence.should match /^#{SENTENCE}$/
+    LiterateRandomizer.paragraph.should match /^#{SENTENCES}$/
+  end
+  it "join param should work" do
+    LiterateRandomizer.paragraphs(:paragraphs => 2, :words =>2, :sentences => 2, :join=>"--").should match /^#{SENTENCES}--#{SENTENCES}$/
   end
   it "global_randomizer_should forwarding should work" do
-    LiterateRandomizer.paragraphs(:words =>2, :sentences => 2).should == "Bad money. Instanced a.\n\nFLAIL OF. Melee in.\n\nHit you. Executive and.\n\nHopes and. Puffing red-faced."
-    LiterateRandomizer.paragraphs(:words =>2, :sentences => 2, :join=>"--").should == "Pick holes. Telling you.--Mend it. Considerate of!--Albany and! Fame or?--The weak. Prime mover."
-    LiterateRandomizer.paragraphs(:words =>2, :sentences => 2, :join=>false).should == ["Reporters down. Again the.", "Their position. Dressing down.", "Chandeliers in. Although every."]
+    LiterateRandomizer.paragraphs(:paragraphs => 2, :words =>2, :sentences => 2).should match /^#{SENTENCE} #{SENTENCE}\n\n#{SENTENCE} #{SENTENCE}$/
+    a = LiterateRandomizer.paragraphs(:paragraphs => 2, :words =>2, :sentences => 2, :join=>false)
+    a.length.should == 2
+    a.each {|b|b.should match /^#{SENTENCES}$/}
   end
 end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: literate_randomizer
 version: !ruby/object:Gem::Version
-  version: 0.3.1
+  version: 0.4.0
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-10-18 00:00:00.000000000 Z
+date: 2012-10-28 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
@@ -43,6 +43,9 @@ files:
 - data/the_lost_world_by_arthur_conan_doyle.txt
 - lib/literate_randomizer.rb
 - lib/literate_randomizer/markov.rb
+- lib/literate_randomizer/randomizer.rb
+- lib/literate_randomizer/source_parser.rb
+- lib/literate_randomizer/util.rb
 - lib/literate_randomizer/version.rb
 - literate_randomizer.gemspec
 - spec/literate_randomizer_spec.rb
@@ -73,3 +76,4 @@ summary: A random sentence and paragraph generator gem. Using Markov chains, thi
   generates near-english prose.
 test_files:
 - spec/literate_randomizer_spec.rb
+has_rdoc: