loremarkov 0.0.0.6 → 0.1.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c23bcd59f4c5aecef4c2d39b0924566938abd4aa
4
- data.tar.gz: f731832c15a97daabdf1d3e6b6cbf97e0413c73b
3
+ metadata.gz: f1b090a999f8b1b376bf4ca583921332ae438353
4
+ data.tar.gz: c6d7d3a7a0eb6232485b0e19930038c84592c30f
5
5
  SHA512:
6
- metadata.gz: 2eaccd869e4a25aa621e1875415bff8e7649062904b59fdbf97e09f7a2635a9dc6560d45d64330e7ca607c9702732024c5eefa664accebe9bc43f64f8e83b6a1
7
- data.tar.gz: 4b92c2d87f6e4dede7f9c7ab2c21b278860bc2bf063bbcd0508cbc8e3cf17829ea20dde9142b35b5385d471da6c690a64500235a38d1a44a0ce3c8812da8c4a5
6
+ metadata.gz: 2c1365ca9471260d02e26451a4dec7819910251c12610aac187dfc8c53f96db81e6fb7c7a6ae8bc32fdfc594a18efef3bd215d34ef8ff77e3dcdb1f4785fe560
7
+ data.tar.gz: 92b2da3a553e8170383c377be1c288891921327dc964d47d7e05b2acad68119953fca975d57c6bb939059b3161e83369390fb0fdbb26a6cbde13f9feae26a86e
data/README.md CHANGED
@@ -1,15 +1,40 @@
1
1
  Introduction
2
2
  ===
3
3
 
4
- Need to generate text in a hurry? This is the tool for you!
4
+ Need to generate text in a hurry? This is the tool for you! With several sample texts built in, you can generate plausible sounding passages at the push of
5
+ a button.
5
6
 
6
- With several sample texts built in, you can generate plausible sounding
7
- passages, ready for copy / pasting, at the push of a button. Just install the
8
- gem and run `destroy` for the default *lorem ipsum* paragraph.
7
+ Install
8
+ ---
9
+ $ gem install loremarkov
9
10
 
10
- Try `destroy epigenetics` or `destroy oslo_accords` for additional fun. Or
11
- provide your own: `destroy ~/my_first_corpus.txt`
11
+ Lorem ipsum
12
+ ---
13
+
14
+ $ destroy
15
+
16
+ Usage
17
+ ===
18
+ * As a library (see [bin/destroy](https://github.com/rickhull/loremarkov/blob/master/bin/destroy) for an example)
19
+ * Via `destroy` executable
20
+
21
+ bin/destroy
22
+ ---
23
+ * Accepts input via filename or STDIN
24
+ * Also recognizes sample texts:
25
+ - lorem_ipsum
26
+ - epigenetics
27
+ - oslo_accords
28
+ * Provide a secondary parameter to control num_prefix_words
29
+
30
+ Examples
31
+ ---
32
+ $ destroy # or destroy lorem_ipsum
33
+ $ destroy epigenetics
34
+ $ destroy oslo_accords 3
35
+ $ destroy ~/my_first_corpus.txt
36
+ $ man ls | destroy 6
12
37
 
13
38
  Inspiration
14
39
  ===
15
- * Based off of Kernighan & Pike's *The Practice of Programming* Chapter 3
40
+ * Based upon Kernighan & Pike's *The Practice of Programming* Chapter 3
data/Rakefile CHANGED
@@ -21,3 +21,10 @@ Rake::TestTask.new do |t|
21
21
  t.pattern = "test/bench_*.rb"
22
22
  # t.warning = true
23
23
  end
24
+
25
+ desc "Run rocco - generate literate programming html"
26
+ task :rocco do
27
+ Dir.chdir File.join(__dir__, 'lib') do
28
+ `rocco *.rb -o ../rocco/`
29
+ end
30
+ end
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.0.6
1
+ 0.1.0.1
data/lib/loremarkov.rb CHANGED
@@ -1,20 +1,55 @@
1
+ # **Loremarkov** uses Markov chains to generate plausible-sounding text, given
2
+ # an input corpus. It comes with a few built-in sample texts.
3
+ #
4
+ # It is based upon Kernighan & Pike's *The Practice of Programming* Chapter 3
5
+ #
6
+ # Install Loremarkov with Rubygems:
7
+ #
8
+ # gem install loremarkov
9
+ #
10
+ # Once installed, the `destroy` command can be used to generate plausible-
11
+ # sounding text. The input text may be provided by filename, STDIN, or naming
12
+ # one of the built-in sample texts.
13
+ #
14
+ # destroy lorem_ipsum
15
+ # destroy ~/my_first_corpus.txt
16
+ # man ls | destroy
17
+
1
18
  class Loremarkov
19
+ ##### TOKENS - These tokens are what splits the text into words.
20
+ # In contrast to ruby's String#split, these tokens are included in the
21
+ # resulting array.
2
22
  TOKENS = ["\n", "\t", ' ', "'", '"']
3
23
 
4
- # Decompose text into an array of tokens, including and delimited by TOKENS
5
- # e.g. "Hello", he said.
6
- # # => ['"', 'Hello', '"', ',', ' ', 'he', ' ', 'said.',]
24
+ ##### lex - Decompose text into an array of tokens and words
25
+ # Words are merely the string of characters between the nearest two TOKENS
26
+ # e.g.
27
+ #
28
+ # lex %q{"Hello", he said.}
29
+ #
30
+ # becomes
31
+ #
32
+ # - %q{"} # TOKEN
33
+ # - %q{Hello} # word
34
+ # - %q{"} # TOKEN
35
+ # - %q{,} # word
36
+ # - %q{ } # TOKEN
37
+ # - %q{he} # word
38
+ # - %q{ } # TOKEN
39
+ # - %q{said.} # word
40
+ #
7
41
  # This operation can be losslessly reversed by calling #join on the resulting
8
42
  # array.
9
- # i.e. lex(str).join == str
43
+ # i.e. `lex(str).join == str`
10
44
  #
11
45
  def self.lex(str, tokens = TOKENS)
12
46
  final_ary = []
13
47
  word = ''
14
- str.each_byte { |b| # yes I am terrible with encodings
15
- # either a token (thereby ending the current word)
16
- # or part of the current word
17
- #
48
+ # This code makes no attempt to deal with non-ASCII string encodings.
49
+ # i.e. byte-per-char
50
+ str.each_byte { |b|
51
+ # This byte is either a token, thereby ending the current word
52
+ # or it is part of the current word
18
53
  if tokens.include?(b.chr)
19
54
  final_ary << word if !word.empty?
20
55
  final_ary << b.chr
@@ -28,63 +63,56 @@ class Loremarkov
28
63
  end
29
64
 
30
65
 
31
- # Generate a markov data structure
32
- # Arrays of string for keys and values
33
- # Keys are prefixes -- ordered word sequence of constant length
34
- # Values are an accumulation of the next word after the prefix, however many
35
- # times it may occur.
36
- # e.g. If a prefix occurs twice, then the value will be
37
- # an array of two words -- possibly the same word twice.
38
- #
66
+ ##### analyze - Generate a markov data structure
67
+ # * Arrays of string for keys and values
68
+ # * Keys are prefixes -- ordered word sequence of constant length
69
+ # * Values are an accumulation of the next word after the prefix, however
70
+ # many times it may occur.
71
+ # * e.g. If a prefix occurs twice, then the value will be an array of two
72
+ # words -- possibly the same word twice.
39
73
  def self.analyze(text, num_prefix_words)
40
74
  markov = {}
41
75
  words = lex(text)
42
76
 
43
- # Go through the possible valid prefixes.
44
- # Adding 1 gives you the final key:
45
- # *num_prefix_words* words with a nil value -- signifying EOF
46
- #
77
+ # Go through the possible valid prefixes. Adding 1 gives you the final
78
+ # key: *num_prefix_words* words with a nil value -- signifying EOF
47
79
  (words.length - num_prefix_words + 1).times { |i|
48
80
  prefix_words = []
49
81
  num_prefix_words.times { |j| prefix_words << words[i + j] }
50
-
51
- # set to empty array on a new prefix
52
- #
82
+ # Set to empty array on a new prefix.
83
+ # Add the target word, which will be nil on the last iteration
53
84
  markov[prefix_words] ||= []
54
- # add the target word, which will be nil on the last iteration
55
85
  markov[prefix_words] << words[i + num_prefix_words]
56
86
  }
57
87
  markov
58
88
  end
59
89
 
60
- # given the entire text, use an extremely conservative heuristic
61
- # to grab only the first chunk to pass to lex
62
- #
90
+ # Given the entire text, use an extremely conservative heuristic to grab only
91
+ # the first chunk to pass to lex
63
92
  def self.start_prefix(text, num_prefix_words)
64
93
  lex(text[0, 999 * num_prefix_words])[0, num_prefix_words]
65
94
  end
66
95
 
67
96
  attr_reader :markov
68
97
 
98
+ # More prefix_words means tighter alignment to original text
69
99
  def initialize(num_prefix_words)
70
100
  @num_prefix_words = num_prefix_words
71
101
  @markov = {}
72
102
  end
73
103
 
74
- # text should have a definite end, not just a convenient buffer split
75
- #
104
+ # Generate Markov structure from text.
105
+ # Text should have a definite end, not just a convenient buffer split
76
106
  def analyze(text)
77
107
  @markov.merge!(self.class.analyze(text, @num_prefix_words))
78
108
  end
79
109
 
80
- # given a prefix, give me the next word
81
- #
110
+ # Generate the next word for a given prefix
82
111
  def generate_one(prefix_words)
83
112
  @markov.fetch(prefix_words).sample
84
113
  end
85
114
 
86
- # given the start prefix, generate words until EOF
87
- #
115
+ # Given the start prefix, generate words until EOF
88
116
  def generate_all(start_prefix_words)
89
117
  words = start_prefix_words
90
118
  while tmp = generate_one(words[-1 * @num_prefix_words, @num_prefix_words])
@@ -93,8 +121,7 @@ class Loremarkov
93
121
  words.join
94
122
  end
95
123
 
96
- # do it, you know you want to
97
- #
124
+ # Do it, you know you want to
98
125
  def destroy(text)
99
126
  analyze(text)
100
127
  generate_all(self.class.start_prefix(text, @num_prefix_words))
data/loremarkov.gemspec CHANGED
@@ -19,6 +19,7 @@ Gem::Specification.new do |s|
19
19
  s.executables = ['destroy']
20
20
  s.add_development_dependency "buildar", "~> 2"
21
21
  s.add_development_dependency "minitest", "~> 5"
22
+ s.add_development_dependency "rocco", "~> 0"
22
23
  s.required_ruby_version = "~> 2"
23
24
 
24
25
  s.version = File.read(File.join(__dir__, 'VERSION')).chomp
data/text/lorem_ipsum CHANGED
@@ -1 +1,5 @@
1
1
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
2
+
3
+ Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
4
+
5
+ At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: loremarkov
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.0.6
4
+ version: 0.1.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rick Hull
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-12-08 00:00:00.000000000 Z
11
+ date: 2014-12-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: buildar
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '5'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rocco
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  description: Text goes in, markov gibberish comes out
42
56
  email:
43
57
  executables: