RubyGems - markov_words - Versions diffs - 1.0.1 → 2.0.0 - Mend

markov_words 1.0.1 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/Gemfile.lock +1 -1
data/README.md +54 -10
data/bin/benchmark +84 -0
data/lib/markov_words/generator.rb +15 -51
data/lib/markov_words/version.rb +1 -1
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ae8fa0d54564e0c30c1407409374d086c3d935a0
-  data.tar.gz: 9662260cf2cdb1ee137c3839900da60aec770977
+  metadata.gz: 2605351d6864f6d2cb9ffc5f498b647b306dc600
+  data.tar.gz: 85d646b15bb737aca9394f69cde8ea03922bcc10
 SHA512:
-  metadata.gz: 6b81af4c6a78491a42f009d7975a044ab3e8012e772cd78dae0f78efbd13a89274d2085e11396921502e8aba342d3b2a16507669af79dc7e77cf227b0a5af717
-  data.tar.gz: 510837581b8b014155a4fc1084329c18539bb33e0013733090fd9f8bec22c32dc6bedba647b81e78f93f4d33465de788bc4876b8617d82a7a2c1b1a91a791f5e
+  metadata.gz: 1222be879d0fec47f344f71893b43cfba653bb05e5fdca56ba4409dad723efd4158458b7f4065d80fb4a31bcaa09878568264564ba36b5c3620fe7afe2092802
+  data.tar.gz: 13bd043e2168d8f2dbe1cf3f33c38268b0bed4a6f0fcfde0f5ae360a770c0b376913ef4973b350a3ad143864a5766ddfddd22de23ddfedb6b328d1c1e8864a08

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    markov_words (1.0.1)
+    markov_words (2.0.0)
       sqlite3-ruby (~> 1.3)
 GEM

data/README.md CHANGED Viewed

@@ -72,28 +72,72 @@ You can also clear out the contents of the data file (because `MarkovWords` will
 generator = MarkovWords::Generator.new(data_file: /tmp/markov.data, flush_data: true)
 ```
+### Custom Metadata
-### Caching
+A `Generator` object gives you access to its `.data_store`, which is an instance of a `FileStore` object. This gives you the ability to store custom metadata into the same database that holds the n-gram information.
-Because calculation can get slow, especially at high n-gram sizes, `MarkovWords` will cache 100 words by default . If you want to control caching, you can adjust caching parameters eg:
+One example of how you might use this would be to cache words for later use (since initial word generation can be slow, even after the database has been generated the first time):
 ```ruby
-# For no caching whatsoever
-generator = MarkovWords::Generator.new(perform_caching: false)
+generator = MarkovWords::Generator.new
+my_cache = 100.times.map { generator.word }
+generator.data_store.store_data :cache, my_cache
-# To change the number of pre-computed/stored words to 1000:
-generator = MarkovWords::Generator.new(cache_size: 1000)
+# then later, perhaps on another page load in a web server...
+my_cache = generator.data_store.retrieve_data :cache
 ```
-You can "top off" the cache to make sure it's full with:
+### Benchmarking
-```ruby
-generator = MarkovWords::Generator.new
-generator.refresh_cache
+We've included a `bin/benchmark` script, which will measure initial load times, and then the time it takes to generate 100 words at various dictionary n-gram sizes.
+Here is an example run:
+```
+bin/benchmark 1 6 '/usr/share/dict/words'
+Minimum n-gram size set to 1
+Maximum n-gram size set to 6
+Corpus file set to /usr/share/dict/words
+Test initial database creation time versus gram size? (y/n) y
+------------------------------------------------------------
+user     system      total        real
+size: 1   4.080000   0.010000   4.090000 (  4.108898)
+size: 2   8.320000   0.090000   8.410000 (  8.554122)
+size: 3  12.710000   0.080000  12.790000 ( 12.869257)
+size: 4  18.750000   0.160000  18.910000 ( 19.102232)
+size: 5  25.440000   0.250000  25.690000 ( 25.953532)
+size: 6  31.060000   0.340000  31.400000 ( 31.680680)
+------------------------------------------------------------
+Test existing database on disk, initial memory load? (y/n) y
+------------------------------------------------------------
+user     system      total        real
+size: 1   0.000000   0.000000   0.000000 (  0.000587)
+size: 2   0.000000   0.000000   0.000000 (  0.005109)
+size: 3   0.080000   0.010000   0.090000 (  0.077303)
+size: 4   0.330000   0.070000   0.400000 (  0.395079)
+size: 5   1.030000   0.130000   1.160000 (  1.157014)
+size: 6   2.920000   0.120000   3.040000 (  3.045219)
+------------------------------------------------------------
+Test word generation averages for 100 words per gram size? (y/n) y
+------------------------------------------------------------
+user     system      total        real
+size: 1   0.010000   0.000000   0.010000 (  0.003971)
+size: 2   0.010000   0.000000   0.010000 (  0.009460)
+size: 3   0.120000   0.000000   0.120000 (  0.127297)
+size: 4   0.350000   0.010000   0.360000 (  0.354564)
+size: 5   2.250000   0.020000   2.270000 (  2.302405)
+size: 6   4.000000   0.120000   4.120000 (  4.186757)
+------------------------------------------------------------
 ```
 ## Change Log
+- `2.0.0`
+    - Breaking changes:
+      - Removed all caching functions from `Generator`. They were cluttering up the code, without being a necessary function of a `Generator`.
+      - Added an `attr_accessor` for `Generator.data_store`, so that users can implement custom metadata for `Generator` objects, and store it in the same `FileStore` object that holds the database.
 - `1.0.0` introduced a couple of breaking changes:
     - `Words` class renamed to `Generator`.
     - `Generator`:

data/bin/benchmark ADDED Viewed

@@ -0,0 +1,84 @@
+#!/usr/bin/env ruby
+# frozen-string-literal: true
+require 'benchmark'
+require 'bundler/setup'
+require 'markov_words'
+# Utility class to generate benchmarks for MarkovWords
+class GeneratorBenchmark
+  LABEL_WIDTH = 7
+  def run
+    test_if_desired 'initial database creation time versus gram size' do
+      Benchmark.bm(LABEL_WIDTH) do |x|
+        @min_gram_size.upto(@max_gram_size) do |size|
+          generator =
+            MarkovWords::Generator.new(flush_data: true,
+                                       gram_size: size,
+                                       corpus_file: @corpus_file)
+          x.report("size: #{size}") { generator.word }
+        end
+      end
+    end
+    test_if_desired 'existing database on disk, initial memory load' do
+      Benchmark.bm(LABEL_WIDTH) do |x|
+        @min_gram_size.upto(@max_gram_size) do |size|
+          generator =
+            MarkovWords::Generator.new(flush_data: true,
+                                       gram_size: size,
+                                       corpus_file: @corpus_file)
+          _word = generator.word # this will run initial setup
+          generator_load_data_from_file =
+            MarkovWords::Generator.new(gram_size: size,
+                                       corpus_file: @corpus_file)
+          x.report("size: #{size}") { generator_load_data_from_file.word }
+        end
+      end
+    end
+    test_if_desired 'word generation averages for 100 words per gram size' do
+      Benchmark.bm(LABEL_WIDTH) do |x|
+        @min_gram_size.upto(@max_gram_size) do |size|
+          generator =
+            MarkovWords::Generator.new(flush_data: true,
+                                       gram_size: size,
+                                       perform_caching: false,
+                                       corpus_file: @corpus_file)
+          _word = generator.word # this will run initial setup
+          x.report("size: #{size}") { 1.upto(100) { generator.word } }
+        end
+      end
+    end
+  end
+  def initialize(opts)
+    @min_gram_size = opts.fetch :min_gram_size, 1
+    @max_gram_size = opts.fetch :max_gram_size, 6
+    @corpus_file = opts.fetch :corpus_file, '/usr/share/dict/words'
+    puts "Minimum n-gram size set to #{@min_gram_size}"
+    puts "Maximum n-gram size set to #{@max_gram_size}"
+    puts "Corpus file set to #{@corpus_file}"
+  end
+  def print_separator
+    printf "%s\n", Array.new(60).map { '-' }.join
+  end
+  def test_if_desired(description, &block)
+    printf "\n%s", "Test #{description}? (y/n) "
+    if /y/.match?($stdin.readline)
+      print_separator
+      yield(block)
+      print_separator
+    end
+  end
+end
+if ARGV.empty?
+  puts "USAGE: bin/benchmark min_gram_size max_gram_size corpus_file\n"
+end
+bm = GeneratorBenchmark.new(min_gram_size: ARGV[0].to_i,
+                            max_gram_size: ARGV[1].to_i,
+                            corpus_file: ARGV[2])
+bm.run

data/lib/markov_words/generator.rb CHANGED Viewed

@@ -1,26 +1,27 @@
 # frozen-string-literal: true
 module MarkovWords
-  # This class takes care of word generation, caching, and data storage.
+  # This class takes care of word generation, and will store the database into
+  # a `FileStore` object.
   class Generator
-    # The current list of cached words.
-    # @return [Array<String>] All words in the cache.
-    def cache
-      @data_store.retrieve_data(:cache)
-    end
+    # It's useful to be able to access the data store object directly, for
+    #   example if you were to want to implement storage of related metadata
+    #   into the same storage system that holds the database.
+    attr_reader :data_store
     # The current database of n-gram mappings
     # @return [Hash] n-gram database
     def grams
-      @grams = @grams ||
-               @data_store.retrieve_data(:grams) ||
-               markov_corpus(@corpus_file, @gram_size)
+      if @grams.nil?
+        @grams = @data_store.retrieve_data(:grams) ||
+                 markov_corpus(@corpus_file, @gram_size)
+      else
+        @grams
+      end
     end
     # Create a new "Words" object
     # @param opts [Hash]
-    # @option opts [Integer] :cache_size How many words to pre-calculate +
-    #   store in the cache for quick retrieval
     # @option opts [String] :corpus_file ('/usr/share/dict/words') Your
     #   dictionary of words.
     # @option opts [String] :data_file Location where calculations are
@@ -34,7 +35,6 @@ module MarkovWords
     #   NOTE: If your corpus size is very small (<1000 words or so), it's hard
     #   to guarantee a min_length because so many n-grams will have no
     #   association, which terminates word generation.
-    # @option opts [Boolean] :perform_caching (true) Perform caching?
     # @return [Words] A `MarkovWords::Generator` object.
     def initialize(opts = {})
       @grams = nil
@@ -42,33 +42,13 @@ module MarkovWords
       @max_length = opts.fetch :max_length, 16
       @min_length = opts.fetch :min_length, 3
-      initialize_cache(opts)
       initialize_data(opts)
     end
-    # "Top off" the cache of stored words, and ensure that it's at
-    # `@cache_size`. If `perform_caching` is set to `false`, returns an empty
-    # array.
-    # @return [Array<String>] All words in the cache.
-    def refresh_cache
-      if @perform_caching
-        words_array = @data_store.retrieve_data(:cache) || []
-        words_array << generate_word while words_array.length < @cache_size
-        @data_store.store_data(:cache, words_array)
-        words_array
-      else
-        []
-      end
-    end
-    # Generate a new word, or return one from the cache if available.
+    # Generate a new word
     # @return [String] The word.
     def word
-      if @perform_caching
-        load_word_from_cache
-      else
-        generate_word
-      end
+      generate_word
     end
     private
@@ -81,11 +61,6 @@ module MarkovWords
       end
     end
-    def initialize_cache(opts)
-      @cache_size = opts.fetch :cache_size, 100
-      @perform_caching = opts.fetch :perform_caching, true
-    end
     def initialize_data(opts)
       @corpus_file = opts.fetch :corpus_file, '/usr/share/dict/words'
       @data_file = opts.fetch :data_file, 'tmp/markov_words.data'
@@ -138,18 +113,6 @@ module MarkovWords
       /[\r\n]/.match? word
     end
-    def load_word_from_cache
-      words_array = @data_store.retrieve_data(:cache)
-      if words_array.nil? || words_array.empty?
-        words_array = Array.new(@cache_size) { generate_word }
-      end
-      word = words_array.pop
-      @data_store.store_data(:cache, words_array)
-      word
-    end
     # Generate a MarkovWords corpus from a datafile, with a given size of
     # n-gram.  Returns a hash of "grams", which are a map of a letter to the
     # frequency of the letters that follow it, eg: {"c" => {"a" => 1, "b" =>
@@ -165,6 +128,7 @@ module MarkovWords
         end
       end
+      @data_store.store_data(:grams, grams)
       grams
     end

data/lib/markov_words/version.rb CHANGED Viewed

@@ -2,5 +2,5 @@
 module MarkovWords
   # Current version
-  VERSION = '1.0.1'
+  VERSION = '2.0.0'
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: markov_words
 version: !ruby/object:Gem::Version
-  version: 1.0.1
+  version: 2.0.0
 platform: ruby
 authors:
 - Donald Merand
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2017-12-09 00:00:00.000000000 Z
+date: 2017-12-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -114,6 +114,7 @@ files:
 - LICENSE.txt
 - README.md
 - Rakefile
+- bin/benchmark
 - bin/console
 - bin/setup
 - lib/markov_words.rb