markov_twitter 0.0.4 → 0.0.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +59 -0
- data/lib/markov_twitter/markov_builder.rb +1 -1
- data/lib/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c2faa582324c1ce00ac06646feb91145fc6cee71
|
4
|
+
data.tar.gz: ea6fa71b4cdcfdf5890c679eb1b268003b67ca12
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 95536e81a893cd662359eaebc89c88fac364875ea034a6e9795782ebf001c5fb87291b440edadbf1d465930f15496a6e66b3745c030b6052da80bdb95095bb04
|
7
|
+
data.tar.gz: 9d900f82d787734b3fc201d0900ae16bc6a0224a7d92747d8d0ea44127ce2a5397f758d2564966b2c98e96b559287299fe7008679df86dac3f526c726355f258
|
data/README.md
CHANGED
@@ -219,3 +219,62 @@ Things which would be interesting to add:
|
|
219
219
|
- dictionary-based search and replace
|
220
220
|
- part-of-speech-based search and replace
|
221
221
|
|
222
|
+
## performance
|
223
|
+
|
224
|
+
Here are the benchmarks for indexing and evaluation for Moby Dick (~115k whitespace-delimited words/punctuation sequence). Since the program uses whitespace to separate words and treats punctuation like any other word, the start/end of phrases needs to be manually defined. Here, I'm splitting phrases by empty lines (e.g. by paragraph).
|
225
|
+
|
226
|
+
```txt
|
227
|
+
loading the text into memory
|
228
|
+
0.030000 0.000000 0.030000 ( 0.101186)
|
229
|
+
|
230
|
+
adding the text to a markov chain
|
231
|
+
20.340000 0.070000 20.410000 ( 20.608080)
|
232
|
+
|
233
|
+
evaluating 10k words with random evaluator
|
234
|
+
3.410000 0.000000 3.410000 ( 3.456607)
|
235
|
+
|
236
|
+
evaluating 10k words with favor_next evaluator
|
237
|
+
1.440000 0.000000 1.440000 ( 1.471715)
|
238
|
+
|
239
|
+
evaluating 10k words with favor_prev evaluator
|
240
|
+
3.540000 0.000000 3.540000 ( 3.563398)
|
241
|
+
```
|
242
|
+
|
243
|
+
Here's some example results:
|
244
|
+
|
245
|
+
```rb
|
246
|
+
chain = MarkovTwitter::MarkovBuilder.new(
|
247
|
+
phrases: File.read("spec/mobydick.txt").split(/^\s?$/)
|
248
|
+
)
|
249
|
+
```
|
250
|
+
|
251
|
+
```rb
|
252
|
+
|
253
|
+
chain.evaluate length: 50
|
254
|
+
```
|
255
|
+
|
256
|
+
> "block, I began howling than in the sea as some fifteen thousand men ? ' And of something like a lightning-like hurtling whisper Starbuck now live in her only in an outline pictures life. I feel funny. Fa, la ! ' Pull, then, by what ye harpooneers ! the plainest"
|
257
|
+
|
258
|
+
```rb
|
259
|
+
chain.evaluate_favoring_start length: 50
|
260
|
+
```
|
261
|
+
|
262
|
+
> " But we were locked within. For what does not to be in laying open his shipmates ; most part, that so prolonged, and fasces of miles off Patagonia, ipurcbasefc for all ; softly, and weaving and frankly admit that people to make tearless Lima has at all customary dinner"
|
263
|
+
|
264
|
+
```rb
|
265
|
+
chain.evaluate_favoring_end length: 50
|
266
|
+
```
|
267
|
+
|
268
|
+
> "the best mode in two fellow-beings should rub each in chorus.) In short, and fetch that man may very readily passes through the sailors we found that but signifying nothing. That is a repugnance to me on his father's heathens. Arrived at present as much like the sea."
|
269
|
+
|
270
|
+
```rb
|
271
|
+
# prioritizing improbable linkages and start each phrase with "I"
|
272
|
+
chain._evaluate(
|
273
|
+
length: 50,
|
274
|
+
direction: :next,
|
275
|
+
probability_bounds: [0,5],
|
276
|
+
node_finder: -> (node) { node.value == "I" }
|
277
|
+
).map(&:value).join(" ")
|
278
|
+
```
|
279
|
+
|
280
|
+
> "I allow himself and bent upon us. This whale's heart.' I recall all glittering teeth -gnashing there. Further on, Ishmael, be cherishing unwarrantable prejudices against your thousand Patagonian sights and spears. Some say was darkened like Czar in Queequeg's arm did that typhoon on water when these weapons offensively, and"
|
@@ -3,7 +3,7 @@
|
|
3
3
|
class MarkovTwitter::MarkovBuilder
|
4
4
|
|
5
5
|
# Regex used to split the phrase into tokens.
|
6
|
-
# It splits on any number of whitespace
|
6
|
+
# It splits on any number of whitespace/newline in sequence.
|
7
7
|
# Sequences of punctuation characters are treated like any other word.
|
8
8
|
SeparatorCharacterRegex = /\s+/
|
9
9
|
|
data/lib/version.rb
CHANGED