markov_twitter 0.0.4 → 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cd6b5de1600afc6e7f5cba800814c945b7e4f520
4
- data.tar.gz: b3637dfe3025cf956fde46e23ffaace751d48e47
3
+ metadata.gz: c2faa582324c1ce00ac06646feb91145fc6cee71
4
+ data.tar.gz: ea6fa71b4cdcfdf5890c679eb1b268003b67ca12
5
5
  SHA512:
6
- metadata.gz: ac476e80dc8159e0fe005f5463862f1b5cdcdcefb7e21e4ed4144a6146820eccc5536cf60493bfb94d1930a5cf1b56b6a6815eb544e2bb064401375060bc2bf6
7
- data.tar.gz: 4ea44112cdd56355e2eaf8ce8b9c6d8c35da562db079453818db7fc86fcfc8ddfad75b3ea1a87fb0afe52b3f198424e3b9d79f49c5fe272331b972f4e56d4adf
6
+ metadata.gz: 95536e81a893cd662359eaebc89c88fac364875ea034a6e9795782ebf001c5fb87291b440edadbf1d465930f15496a6e66b3745c030b6052da80bdb95095bb04
7
+ data.tar.gz: 9d900f82d787734b3fc201d0900ae16bc6a0224a7d92747d8d0ea44127ce2a5397f758d2564966b2c98e96b559287299fe7008679df86dac3f526c726355f258
data/README.md CHANGED
@@ -219,3 +219,62 @@ Things which would be interesting to add:
219
219
  - dictionary-based search and replace
220
220
  - part-of-speech-based search and replace
221
221
 
222
+ ## performance
223
+
224
+ Here are the benchmarks for indexing and evaluation for Moby Dick (~115k whitespace-delimited words/punctuation sequence). Since the program uses whitespace to separate words and treats punctuation like any other word, the start/end of phrases needs to be manually defined. Here, I'm splitting phrases by empty lines (e.g. by paragraph).
225
+
226
+ ```txt
227
+ loading the text into memory
228
+ 0.030000 0.000000 0.030000 ( 0.101186)
229
+
230
+ adding the text to a markov chain
231
+ 20.340000 0.070000 20.410000 ( 20.608080)
232
+
233
+ evaluating 10k words with random evaluator
234
+ 3.410000 0.000000 3.410000 ( 3.456607)
235
+
236
+ evaluating 10k words with favor_next evaluator
237
+ 1.440000 0.000000 1.440000 ( 1.471715)
238
+
239
+ evaluating 10k words with favor_prev evaluator
240
+ 3.540000 0.000000 3.540000 ( 3.563398)
241
+ ```
242
+
243
+ Here's some example results:
244
+
245
+ ```rb
246
+ chain = MarkovTwitter::MarkovBuilder.new(
247
+ phrases: File.read("spec/mobydick.txt").split(/^\s?$/)
248
+ )
249
+ ```
250
+
251
+ ```rb
252
+
253
+ chain.evaluate length: 50
254
+ ```
255
+
256
+ > "block, I began howling than in the sea as some fifteen thousand men ? ' And of something like a lightning-like hurtling whisper Starbuck now live in her only in an outline pictures life. I feel funny. Fa, la ! ' Pull, then, by what ye harpooneers ! the plainest"
257
+
258
+ ```rb
259
+ chain.evaluate_favoring_start length: 50
260
+ ```
261
+
262
+ > " But we were locked within. For what does not to be in laying open his shipmates ; most part, that so prolonged, and fasces of miles off Patagonia, ipurcbasefc for all ; softly, and weaving and frankly admit that people to make tearless Lima has at all customary dinner"
263
+
264
+ ```rb
265
+ chain.evaluate_favoring_end length: 50
266
+ ```
267
+
268
+ > "the best mode in two fellow-beings should rub each in chorus.) In short, and fetch that man may very readily passes through the sailors we found that but signifying nothing. That is a repugnance to me on his father's heathens. Arrived at present as much like the sea."
269
+
270
+ ```rb
271
+ # prioritizing improbable linkages and start each phrase with "I"
272
+ chain._evaluate(
273
+ length: 50,
274
+ direction: :next,
275
+ probability_bounds: [0,5],
276
+ node_finder: -> (node) { node.value == "I" }
277
+ ).map(&:value).join(" ")
278
+ ```
279
+
280
+ > "I allow himself and bent upon us. This whale's heart.' I recall all glittering teeth -gnashing there. Further on, Ishmael, be cherishing unwarrantable prejudices against your thousand Patagonian sights and spears. Some say was darkened like Czar in Queequeg's arm did that typhoon on water when these weapons offensively, and"
@@ -3,7 +3,7 @@
3
3
  class MarkovTwitter::MarkovBuilder
4
4
 
5
5
  # Regex used to split the phrase into tokens.
6
- # It splits on any number of whitespace\in sequence.
6
+ # It splits on any number of whitespace/newline in sequence.
7
7
  # Sequences of punctuation characters are treated like any other word.
8
8
  SeparatorCharacterRegex = /\s+/
9
9
 
data/lib/version.rb CHANGED
@@ -1,4 +1,4 @@
1
1
  class MarkovTwitter
2
2
  # The version of the gem.
3
- VERSION = '0.0.4'
3
+ VERSION = '0.0.5'
4
4
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: markov_twitter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - max pleaner