loremarkov 0.0.0.6 → 0.1.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +32 -7
- data/Rakefile +7 -0
- data/VERSION +1 -1
- data/lib/loremarkov.rb +62 -35
- data/loremarkov.gemspec +1 -0
- data/text/lorem_ipsum +4 -0
- metadata +16 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f1b090a999f8b1b376bf4ca583921332ae438353
|
4
|
+
data.tar.gz: c6d7d3a7a0eb6232485b0e19930038c84592c30f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2c1365ca9471260d02e26451a4dec7819910251c12610aac187dfc8c53f96db81e6fb7c7a6ae8bc32fdfc594a18efef3bd215d34ef8ff77e3dcdb1f4785fe560
|
7
|
+
data.tar.gz: 92b2da3a553e8170383c377be1c288891921327dc964d47d7e05b2acad68119953fca975d57c6bb939059b3161e83369390fb0fdbb26a6cbde13f9feae26a86e
|
data/README.md
CHANGED
@@ -1,15 +1,40 @@
|
|
1
1
|
Introduction
|
2
2
|
===
|
3
3
|
|
4
|
-
Need to generate text in a hurry? This is the tool for you!
|
4
|
+
Need to generate text in a hurry? This is the tool for you! With several sample texts built in, you can generate plausible sounding passages at the push of
|
5
|
+
a button.
|
5
6
|
|
6
|
-
|
7
|
-
|
8
|
-
gem
|
7
|
+
Install
|
8
|
+
---
|
9
|
+
$ gem install loremarkov
|
9
10
|
|
10
|
-
|
11
|
-
|
11
|
+
Lorem ipsum
|
12
|
+
---
|
13
|
+
|
14
|
+
$ destroy
|
15
|
+
|
16
|
+
Usage
|
17
|
+
===
|
18
|
+
* As a library (see [bin/destroy](https://github.com/rickhull/loremarkov/blob/master/bin/destroy) for an example)
|
19
|
+
* Via `destroy` executable
|
20
|
+
|
21
|
+
bin/destroy
|
22
|
+
---
|
23
|
+
* Accepts input via filename or STDIN
|
24
|
+
* Also recognizes sample texts:
|
25
|
+
- lorem_ipsum
|
26
|
+
- epigenetics
|
27
|
+
- oslo_accords
|
28
|
+
* Provide a secondary parameter to control num_prefix_words
|
29
|
+
|
30
|
+
Examples
|
31
|
+
---
|
32
|
+
$ destroy # or destroy lorem_ipsum
|
33
|
+
$ destroy epigenetics
|
34
|
+
$ destroy oslo_accords 3
|
35
|
+
$ destroy ~/my_first_corpus.txt
|
36
|
+
$ man ls | destroy 6
|
12
37
|
|
13
38
|
Inspiration
|
14
39
|
===
|
15
|
-
* Based
|
40
|
+
* Based upon Kernighan & Pike's *The Practice of Programming* Chapter 3
|
data/Rakefile
CHANGED
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.1.0.1
|
data/lib/loremarkov.rb
CHANGED
@@ -1,20 +1,55 @@
|
|
1
|
+
# **Loremarkov** uses Markov chains to generate plausible-sounding text, given
|
2
|
+
# an input corpus. It comes with a few built-in sample texts.
|
3
|
+
#
|
4
|
+
# It is based upon Kernighan & Pike's *The Practice of Programming* Chapter 3
|
5
|
+
#
|
6
|
+
# Install Loremarkov with Rubygems:
|
7
|
+
#
|
8
|
+
# gem install loremarkov
|
9
|
+
#
|
10
|
+
# Once installed, the `destroy` command can be used to generate plausible-
|
11
|
+
# sounding text. The input text may be provided by filename, STDIN, or naming
|
12
|
+
# one of the built-in sample texts.
|
13
|
+
#
|
14
|
+
# destroy lorem_ipsum
|
15
|
+
# destroy ~/my_first_corpus.txt
|
16
|
+
# man ls | destroy
|
17
|
+
|
1
18
|
class Loremarkov
|
19
|
+
##### TOKENS - These tokens are what splits the text into words.
|
20
|
+
# In contrast to ruby's String#split, these tokens are included in the
|
21
|
+
# resulting array.
|
2
22
|
TOKENS = ["\n", "\t", ' ', "'", '"']
|
3
23
|
|
4
|
-
|
5
|
-
#
|
6
|
-
#
|
24
|
+
##### lex - Decompose text into an array of tokens and words
|
25
|
+
# Words are merely the string of characters between the nearest two TOKENS
|
26
|
+
# e.g.
|
27
|
+
#
|
28
|
+
# lex %q{"Hello", he said.}
|
29
|
+
#
|
30
|
+
# becomes
|
31
|
+
#
|
32
|
+
# - %q{"} # TOKEN
|
33
|
+
# - %q{Hello} # word
|
34
|
+
# - %q{"} # TOKEN
|
35
|
+
# - %q{,} # word
|
36
|
+
# - %q{ } # TOKEN
|
37
|
+
# - %q{he} # word
|
38
|
+
# - %q{ } # TOKEN
|
39
|
+
# - %q{said.} # word
|
40
|
+
#
|
7
41
|
# This operation can be losslessly reversed by calling #join on the resulting
|
8
42
|
# array.
|
9
|
-
# i.e. lex(str).join == str
|
43
|
+
# i.e. `lex(str).join == str`
|
10
44
|
#
|
11
45
|
def self.lex(str, tokens = TOKENS)
|
12
46
|
final_ary = []
|
13
47
|
word = ''
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
#
|
48
|
+
# This code makes no attempt to deal with non-ASCII string encodings.
|
49
|
+
# i.e. byte-per-char
|
50
|
+
str.each_byte { |b|
|
51
|
+
# This byte is either a token, thereby ending the current word
|
52
|
+
# or it is part of the current word
|
18
53
|
if tokens.include?(b.chr)
|
19
54
|
final_ary << word if !word.empty?
|
20
55
|
final_ary << b.chr
|
@@ -28,63 +63,56 @@ class Loremarkov
|
|
28
63
|
end
|
29
64
|
|
30
65
|
|
31
|
-
|
32
|
-
# Arrays of string for keys and values
|
33
|
-
# Keys are prefixes -- ordered word sequence of constant length
|
34
|
-
# Values are an accumulation of the next word after the prefix, however
|
35
|
-
# times it may occur.
|
36
|
-
# e.g. If a prefix occurs twice, then the value will be
|
37
|
-
#
|
38
|
-
#
|
66
|
+
##### analyze - Generate a markov data structure
|
67
|
+
# * Arrays of string for keys and values
|
68
|
+
# * Keys are prefixes -- ordered word sequence of constant length
|
69
|
+
# * Values are an accumulation of the next word after the prefix, however
|
70
|
+
# many times it may occur.
|
71
|
+
# * e.g. If a prefix occurs twice, then the value will be an array of two
|
72
|
+
# words -- possibly the same word twice.
|
39
73
|
def self.analyze(text, num_prefix_words)
|
40
74
|
markov = {}
|
41
75
|
words = lex(text)
|
42
76
|
|
43
|
-
# Go through the possible valid prefixes.
|
44
|
-
#
|
45
|
-
# *num_prefix_words* words with a nil value -- signifying EOF
|
46
|
-
#
|
77
|
+
# Go through the possible valid prefixes. Adding 1 gives you the final
|
78
|
+
# key: *num_prefix_words* words with a nil value -- signifying EOF
|
47
79
|
(words.length - num_prefix_words + 1).times { |i|
|
48
80
|
prefix_words = []
|
49
81
|
num_prefix_words.times { |j| prefix_words << words[i + j] }
|
50
|
-
|
51
|
-
#
|
52
|
-
#
|
82
|
+
# Set to empty array on a new prefix.
|
83
|
+
# Add the target word, which will be nil on the last iteration
|
53
84
|
markov[prefix_words] ||= []
|
54
|
-
# add the target word, which will be nil on the last iteration
|
55
85
|
markov[prefix_words] << words[i + num_prefix_words]
|
56
86
|
}
|
57
87
|
markov
|
58
88
|
end
|
59
89
|
|
60
|
-
#
|
61
|
-
#
|
62
|
-
#
|
90
|
+
# Given the entire text, use an extremely conservative heuristic to grab only
|
91
|
+
# the first chunk to pass to lex
|
63
92
|
def self.start_prefix(text, num_prefix_words)
|
64
93
|
lex(text[0, 999 * num_prefix_words])[0, num_prefix_words]
|
65
94
|
end
|
66
95
|
|
67
96
|
attr_reader :markov
|
68
97
|
|
98
|
+
# More prefix_words means tighter alignment to original text
|
69
99
|
def initialize(num_prefix_words)
|
70
100
|
@num_prefix_words = num_prefix_words
|
71
101
|
@markov = {}
|
72
102
|
end
|
73
103
|
|
74
|
-
#
|
75
|
-
#
|
104
|
+
# Generate Markov structure from text.
|
105
|
+
# Text should have a definite end, not just a convenient buffer split
|
76
106
|
def analyze(text)
|
77
107
|
@markov.merge!(self.class.analyze(text, @num_prefix_words))
|
78
108
|
end
|
79
109
|
|
80
|
-
#
|
81
|
-
#
|
110
|
+
# Generate the next word for a given prefix
|
82
111
|
def generate_one(prefix_words)
|
83
112
|
@markov.fetch(prefix_words).sample
|
84
113
|
end
|
85
114
|
|
86
|
-
#
|
87
|
-
#
|
115
|
+
# Given the start prefix, generate words until EOF
|
88
116
|
def generate_all(start_prefix_words)
|
89
117
|
words = start_prefix_words
|
90
118
|
while tmp = generate_one(words[-1 * @num_prefix_words, @num_prefix_words])
|
@@ -93,8 +121,7 @@ class Loremarkov
|
|
93
121
|
words.join
|
94
122
|
end
|
95
123
|
|
96
|
-
#
|
97
|
-
#
|
124
|
+
# Do it, you know you want to
|
98
125
|
def destroy(text)
|
99
126
|
analyze(text)
|
100
127
|
generate_all(self.class.start_prefix(text, @num_prefix_words))
|
data/loremarkov.gemspec
CHANGED
@@ -19,6 +19,7 @@ Gem::Specification.new do |s|
|
|
19
19
|
s.executables = ['destroy']
|
20
20
|
s.add_development_dependency "buildar", "~> 2"
|
21
21
|
s.add_development_dependency "minitest", "~> 5"
|
22
|
+
s.add_development_dependency "rocco", "~> 0"
|
22
23
|
s.required_ruby_version = "~> 2"
|
23
24
|
|
24
25
|
s.version = File.read(File.join(__dir__, 'VERSION')).chomp
|
data/text/lorem_ipsum
CHANGED
@@ -1 +1,5 @@
|
|
1
1
|
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
|
2
|
+
|
3
|
+
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
|
4
|
+
|
5
|
+
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: loremarkov
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Rick Hull
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-12-
|
11
|
+
date: 2014-12-09 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: buildar
|
@@ -38,6 +38,20 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '5'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rocco
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - "~>"
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
41
55
|
description: Text goes in, markov gibberish comes out
|
42
56
|
email:
|
43
57
|
executables:
|