RubyGems - markov_twitter - Versions diffs - 0.0.4 → 0.0.5 - Mend

markov_twitter 0.0.4 → 0.0.5

Files changed (5) hide show

checksums.yaml +4 -4
data/README.md +59 -0
data/lib/markov_twitter/markov_builder.rb +1 -1
data/lib/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: cd6b5de1600afc6e7f5cba800814c945b7e4f520
-  data.tar.gz: b3637dfe3025cf956fde46e23ffaace751d48e47
+  metadata.gz: c2faa582324c1ce00ac06646feb91145fc6cee71
+  data.tar.gz: ea6fa71b4cdcfdf5890c679eb1b268003b67ca12
 SHA512:
-  metadata.gz: ac476e80dc8159e0fe005f5463862f1b5cdcdcefb7e21e4ed4144a6146820eccc5536cf60493bfb94d1930a5cf1b56b6a6815eb544e2bb064401375060bc2bf6
-  data.tar.gz: 4ea44112cdd56355e2eaf8ce8b9c6d8c35da562db079453818db7fc86fcfc8ddfad75b3ea1a87fb0afe52b3f198424e3b9d79f49c5fe272331b972f4e56d4adf
+  metadata.gz: 95536e81a893cd662359eaebc89c88fac364875ea034a6e9795782ebf001c5fb87291b440edadbf1d465930f15496a6e66b3745c030b6052da80bdb95095bb04
+  data.tar.gz: 9d900f82d787734b3fc201d0900ae16bc6a0224a7d92747d8d0ea44127ce2a5397f758d2564966b2c98e96b559287299fe7008679df86dac3f526c726355f258

data/README.md CHANGED Viewed

@@ -219,3 +219,62 @@ Things which would be interesting to add:
 - dictionary-based search and replace
 - part-of-speech-based search and replace
+## performance
+Here are the benchmarks for indexing and evaluation for Moby Dick (~115k whitespace-delimited words/punctuation sequence). Since the program uses whitespace to separate words and treats punctuation like any other word, the start/end of phrases needs to be manually defined. Here, I'm splitting phrases by empty lines (e.g. by paragraph).
+```txt
+loading the text into memory
+0.030000   0.000000   0.030000 (  0.101186)
+adding the text to a markov chain
+20.340000   0.070000  20.410000 ( 20.608080)
+evaluating 10k words with random evaluator
+3.410000   0.000000   3.410000 (  3.456607)
+evaluating 10k words with favor_next evaluator
+1.440000   0.000000   1.440000 (  1.471715)
+evaluating 10k words with favor_prev evaluator
+3.540000   0.000000   3.540000 (  3.563398)
+```
+Here's some example results:
+```rb
+chain = MarkovTwitter::MarkovBuilder.new(
+  phrases: File.read("spec/mobydick.txt").split(/^\s?$/)
+)
+```
+```rb
+chain.evaluate length: 50
+```
+> "block, I began howling than in the sea as some fifteen thousand men ? ' And of something like a lightning-like hurtling whisper Starbuck now live in her only in an outline pictures life. I feel funny. Fa, la ! ' Pull, then, by what ye harpooneers ! the plainest"
+```rb
+chain.evaluate_favoring_start length: 50
+```
+> " But we were locked within. For what does not to be in laying open his shipmates ; most part, that so prolonged, and fasces of miles off Patagonia, ipurcbasefc for all ; softly, and weaving and frankly admit that people to make tearless Lima has at all customary dinner"
+```rb
+chain.evaluate_favoring_end length: 50
+```
+> "the best mode in two fellow-beings should rub each in chorus.)  In short, and fetch that man may very readily passes through the sailors we found that but signifying nothing.  That is a repugnance to me on his father's heathens. Arrived at present as much like the sea."
+```rb
+# prioritizing improbable linkages and start each phrase with "I"
+chain._evaluate(
+  length: 50,
+  direction: :next,
+  probability_bounds: [0,5],
+  node_finder: -> (node) { node.value == "I" }
+).map(&:value).join(" ")
+```
+> "I allow himself and bent upon us. This whale's heart.' I recall all glittering teeth -gnashing there. Further on, Ishmael, be cherishing unwarrantable prejudices against your thousand Patagonian sights and spears. Some say was darkened like Czar in Queequeg's arm did that typhoon on water when these weapons offensively, and"

data/lib/markov_twitter/markov_builder.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 class MarkovTwitter::MarkovBuilder
   # Regex used to split the phrase into tokens.
-  # It splits on any number of whitespace\in sequence.
+  # It splits on any number of whitespace/newline in sequence.
   # Sequences of punctuation characters are treated like any other word.
   SeparatorCharacterRegex = /\s+/

data/lib/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 class MarkovTwitter
   # The version of the gem.
-  VERSION = '0.0.4'
+  VERSION = '0.0.5'
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: markov_twitter
 version: !ruby/object:Gem::Version
-  version: 0.0.4
+  version: 0.0.5
 platform: ruby
 authors:
 - max pleaner