panini 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
data/Gemfile ADDED
@@ -0,0 +1,13 @@
1
+ source "http://rubygems.org"
2
+ # Add dependencies required to use your gem here.
3
+ # Example:
4
+ # gem "activesupport", ">= 2.3.5"
5
+
6
+ # Add dependencies to develop your gem here.
7
+ # Include everything needed to run rake, tests, features, etc.
8
+ group :development do
9
+ gem "rspec", "~> 2.3.0"
10
+ gem "bundler", "~> 1.0.0"
11
+ gem "jeweler", "~> 1.6.0"
12
+ gem "rcov", ">= 0"
13
+ end
@@ -0,0 +1,28 @@
1
+ GEM
2
+ remote: http://rubygems.org/
3
+ specs:
4
+ diff-lcs (1.1.2)
5
+ git (1.2.5)
6
+ jeweler (1.6.0)
7
+ bundler (~> 1.0.0)
8
+ git (>= 1.2.5)
9
+ rake
10
+ rake (0.8.7)
11
+ rcov (0.9.9)
12
+ rspec (2.3.0)
13
+ rspec-core (~> 2.3.0)
14
+ rspec-expectations (~> 2.3.0)
15
+ rspec-mocks (~> 2.3.0)
16
+ rspec-core (2.3.1)
17
+ rspec-expectations (2.3.0)
18
+ diff-lcs (~> 1.1.2)
19
+ rspec-mocks (2.3.0)
20
+
21
+ PLATFORMS
22
+ ruby
23
+
24
+ DEPENDENCIES
25
+ bundler (~> 1.0.0)
26
+ jeweler (~> 1.6.0)
27
+ rcov
28
+ rspec (~> 2.3.0)
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2011 Matthew Bellantoni
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,151 @@
1
+ = panini
2
+
3
+ Panini is a flexible toolkit that enables you to generate sentences from a context-free grammar, also known as a CFG.
4
+
5
+ == CFG Background
6
+
7
+ Informally, a context-free grammar consists of a set of productions rules where a _nonterminal_ on the right
8
+ hand side of the produces a string of terminals and nonterminals on the right hand side. Like this:
9
+
10
+ S -> AB
11
+ A -> a
12
+ B -> b
13
+
14
+ I the above example, _S_, _A_, and _B_ are all nonterminals. _a_ and _b_ are terminals. Furthermore, the nonterminal _S_ is the _start_ _symbol_ for this CFG. By applying the productions as follows:
15
+
16
+ S (start symbol)
17
+ AB (apply S -> AB)
18
+ aB (apply A -> a)
19
+ ab (apply B -> b)
20
+
21
+ The sentence _ab_ is generated. In fact, this is the only sentence this grammar can produce! By adding one additional production to the grammar:
22
+
23
+ S -> ASB
24
+
25
+ The grammar may now potentially create an infinite number of sentences. They will all have the form of _a_<sup>i</sup>_b_<sup>i</sup> where _i_ > 1. Here is one more example derivation:
26
+
27
+ S (start symbol)
28
+ ASB (apply S -> ASB)
29
+ aSB (apply A -> a)
30
+ aSb (apply B -> b)
31
+ aaSbb (apply S -> ASB)
32
+ aaABbb (apply S -> AB)
33
+ aaaBbb (apply A -> a)
34
+ aaabbb (apply B -> b)
35
+
36
+ You learn more about CFGs, you can reference the CFG article[http://en.wikipedia.org/wiki/Context-free_grammar] on Wikipedia</a>.
37
+
38
+ == Getting Started With Panini
39
+
40
+ === Defining a Grammar
41
+
42
+ Defining a grammar is easy. Create a grammar object, add some nonterminals and then add the productions to those nonterminals.
43
+
44
+ Here's how the grammar from above is defined:
45
+
46
+ grammar = Panini::Grammar.new
47
+
48
+ nt_s = grammar.add_nonterminal
49
+ nt_a = grammar.add_nonterminal
50
+ nt_b = grammar.add_nonterminal
51
+
52
+ n_s.add_production([n_a, n_b]) # S -> AB
53
+ n_s.add_production([n_a, n_s, n_b]) # S -> ASB
54
+ n_a.add_production(['a']) # A -> 'a'
55
+ n_b.add_production(['b']) # A -> 'b'
56
+
57
+ === Derivators
58
+
59
+ Derivators are objects that take a Panini::Grammar and then apply the rules to generate a sentence. Creating the sentences
60
+ from the grammar can be tricky, and certain derivation strategies may be better for some grammars.
61
+ Currently, the main derivator is the Panini::DerivationStrategy::RandomDampened derivator.
62
+
63
+ derivator = Panini::DerivationStrategy::RandomDampened.new(grammar)
64
+
65
+ === Generating a Sentence
66
+
67
+ To generate a sentence, call the derivator's sentence method like thus:
68
+
69
+ derivator.sentence -> ['a', 'a', 'b', 'b']
70
+
71
+ You will get a new sentence (depending on the grammar) with every call:
72
+
73
+ derivator.sentence -> ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']
74
+
75
+
76
+ == Example
77
+
78
+ In this example, we create a grammar that generates mathematical expressions.
79
+
80
+ # ================
81
+ # = Nonterminals =
82
+ # ================
83
+ expression = grammar.add_nonterminal("EXPR")
84
+ term = grammar.add_nonterminal("TERM")
85
+ factor = grammar.add_nonterminal("FACT")
86
+ identifier = grammar.add_nonterminal("ID")
87
+ number = grammar.add_nonterminal("NUM")
88
+
89
+
90
+ # =============
91
+ # = Terminals =
92
+ # =============
93
+ expression.add_production([term, '+', term])
94
+ expression.add_production([term, '-', term])
95
+ expression.add_production([term])
96
+
97
+ term.add_production([factor, '*', term])
98
+ term.add_production([factor, '/', term])
99
+ term.add_production([factor])
100
+
101
+ factor.add_production([identifier])
102
+ factor.add_production([number])
103
+ factor.add_production(['(', expression, ')'])
104
+
105
+ ('a'..'z').each do |v|
106
+ identifier.add_production([v])
107
+ end
108
+
109
+ (0..100).each do |n|
110
+ number.add_production([n])
111
+ end
112
+
113
+ # ===============================================
114
+ # = Choose a strategy and create some sentences =
115
+ # ===============================================
116
+ deriver = Panini::DerivationStrategy::RandomDampened.new(grammar)
117
+ 10.times do
118
+ puts "#{deriver.sentence.join(' ')}"
119
+ end
120
+
121
+ == Contributing to panini
122
+ * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
123
+ * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
124
+ * Fork the project
125
+ * Start a feature/bugfix branch
126
+ * Commit and push until you are happy with your contribution
127
+ * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
128
+ * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
129
+
130
+ == To Do
131
+
132
+ === Features
133
+ * Detect invalid grammars
134
+ * Weighted productions.
135
+ * Arbitrary start symbol.
136
+ * Support Enumerator?
137
+ * DSL?
138
+ * Purdom Derivator?
139
+
140
+ === Examples
141
+ * Natural language
142
+ * XML
143
+ * JSON
144
+ * Address
145
+ * Tree/Flower (PS?)
146
+ * Simulated user actions
147
+
148
+ == Copyright
149
+
150
+ Copyright (c) 2011 Matthew Bellantoni. See LICENSE.txt for further details.
151
+
@@ -0,0 +1,49 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
17
+ gem.name = "panini"
18
+ gem.homepage = "http://github.com/mjbellantoni/panini"
19
+ gem.license = "MIT"
20
+ gem.summary = %Q{Create sentences from a context-free grammar (CFG)}
21
+ gem.description = %Q{Panini allows you to generate sentences from a context-free grammar, also known as a CFG.}
22
+ gem.email = "mjbellantoni@yahoo.com"
23
+ gem.authors = ["mjbellantoni"]
24
+ # dependencies defined in Gemfile
25
+ end
26
+ Jeweler::RubygemsDotOrgTasks.new
27
+
28
+ require 'rspec/core'
29
+ require 'rspec/core/rake_task'
30
+ RSpec::Core::RakeTask.new(:spec) do |spec|
31
+ spec.pattern = FileList['spec/**/*_spec.rb']
32
+ end
33
+
34
+ RSpec::Core::RakeTask.new(:rcov) do |spec|
35
+ spec.pattern = 'spec/**/*_spec.rb'
36
+ spec.rcov = true
37
+ end
38
+
39
+ task :default => :spec
40
+
41
+ require 'rake/rdoctask'
42
+ Rake::RDocTask.new do |rdoc|
43
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
44
+
45
+ rdoc.rdoc_dir = 'rdoc'
46
+ rdoc.title = "panini #{version}"
47
+ rdoc.rdoc_files.include('README*')
48
+ rdoc.rdoc_files.include('lib/**/*.rb')
49
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 1.0.0
@@ -0,0 +1,51 @@
1
+ # Clean this up. Have it assume there's an install?
2
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
3
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
4
+
5
+ require "panini"
6
+
7
+ grammar = Panini::Grammar.new
8
+
9
+
10
+ # ================
11
+ # = Nonterminals =
12
+ # ================
13
+ expression = grammar.add_nonterminal("EXPR")
14
+ term = grammar.add_nonterminal("TERM")
15
+ factor = grammar.add_nonterminal("FACT")
16
+ identifier = grammar.add_nonterminal("ID")
17
+ number = grammar.add_nonterminal("NUM")
18
+
19
+
20
+ # =============
21
+ # = Terminals =
22
+ # =============
23
+ expression.add_production([term, '+', term])
24
+ expression.add_production([term, '-', term])
25
+ expression.add_production([term])
26
+
27
+ term.add_production([factor, '*', term])
28
+ term.add_production([factor, '/', term])
29
+ term.add_production([factor])
30
+
31
+ factor.add_production([identifier])
32
+ factor.add_production([number])
33
+ factor.add_production(['(', expression, ')'])
34
+
35
+ ('a'..'z').each do |v|
36
+ identifier.add_production([v])
37
+ end
38
+
39
+ # It would be cool to have a way to create a random number.
40
+ (0..100).each do |n|
41
+ number.add_production([n])
42
+ end
43
+
44
+
45
+ # ===============================================
46
+ # = Choose a strategy and create some sentences =
47
+ # ===============================================
48
+ deriver = Panini::DerivationStrategy::RandomDampened.new(grammar)
49
+ 10.times do
50
+ puts "#{deriver.sentence.join(' ')}"
51
+ end
@@ -0,0 +1,13 @@
1
+ module Panini
2
+ module DerivationStrategy
3
+
4
+ class Base
5
+
6
+ def initialize(grammar)
7
+ @grammar = grammar
8
+ end
9
+
10
+ end
11
+
12
+ end
13
+ end
@@ -0,0 +1,69 @@
1
+ module Panini
2
+ module DerivationStrategy
3
+
4
+ class RoundRobinProductionChoiceProxy
5
+
6
+ def initialize(nonterminal)
7
+ @nonterminal = nonterminal
8
+ @round_robin_count = 0
9
+ @production_count = @nonterminal.productions.count
10
+ end
11
+
12
+ def production
13
+ i = @round_robin_count % @production_count
14
+ @round_robin_count += 1
15
+ @nonterminal.productions[i]
16
+ end
17
+
18
+ end
19
+
20
+ # The Leftmost strategy is a naive strategy for deriving sentences from a grammar. It
21
+ # will aways substitute for the leftmost nonterminal first. If a nonterminal has more
22
+ # than one production, they will be chosen in a round-robin ordering.
23
+ #
24
+ # This implementation is slow and will not work on many grammars.
25
+ #
26
+ # In other words, don't use this! It's in place because it is simple to and was used
27
+ # for early testing.
28
+ class Leftmost < Base
29
+
30
+ def initialize(grammar)
31
+ build_production_proxies(grammar)
32
+ super(grammar)
33
+ end
34
+
35
+ def build_production_proxies(grammar)
36
+ @production_proxies = {}
37
+ grammar.nonterminals.each do |nonterminal|
38
+ @production_proxies[nonterminal] = RoundRobinProductionChoiceProxy.new(nonterminal)
39
+ end
40
+ end
41
+ private :build_production_proxies
42
+
43
+ # Generates a sentence.
44
+ def sentence
45
+ derived_sentence, substituted = [@grammar.start], false
46
+ begin
47
+ derived_sentence, substituted = substitution_pass(derived_sentence)
48
+ end while substituted
49
+ derived_sentence
50
+ end
51
+
52
+ def substitution_pass(derived_sentence)
53
+ substituted = false
54
+ derived_sentence = derived_sentence.flat_map do |term|
55
+ if !substituted && (term.class == Nonterminal)
56
+ substituted = true
57
+ @production_proxies[term].production
58
+ else
59
+ term
60
+ end
61
+ end
62
+ return derived_sentence, substituted
63
+ end
64
+ private :substitution_pass
65
+
66
+ end
67
+
68
+ end
69
+ end
@@ -0,0 +1,115 @@
1
+
2
+
3
+ module Panini
4
+ module DerivationStrategy
5
+
6
+ class DampenedProbabilityProductionChoiceProxy
7
+
8
+ def initialize(nonterminal, damping=0.25)
9
+ @nonterminal = nonterminal
10
+ @damping = damping
11
+ @production_counts = @nonterminal.productions.map do
12
+ 0
13
+ end
14
+ end
15
+
16
+ def initialize_copy(source)
17
+ super
18
+ @production_counts = @production_counts.map do |production_count|
19
+ production_count
20
+ end
21
+ end
22
+
23
+ def production
24
+ i = find_index
25
+ @production_counts[i] += 1
26
+ @nonterminal.productions[i]
27
+ end
28
+
29
+ def find_index
30
+
31
+ weights = @production_counts.map do |production_count|
32
+ @damping ** production_count
33
+ end
34
+
35
+ selector = Kernel::rand() * weights.inject(:+)
36
+
37
+ weights.each_with_index do |weight, i|
38
+ selector -= weight
39
+ if selector < 0
40
+ return i
41
+ end
42
+ end
43
+
44
+ raise "You shouldn't be able to get here. #{selector} #{@production_counts}"
45
+
46
+ end
47
+ private :find_index
48
+
49
+ def dump_weights
50
+ puts "production_counts:"
51
+ @weights.each do |weight|
52
+ puts "#{weight} "
53
+ end
54
+ end
55
+
56
+ end
57
+
58
+
59
+ # This derivation strategy uses a dampening factor to reduce the liklihood of hitting either
60
+ # too-deep or infinite traversals through the grammar. This is based on material presented
61
+ # here:
62
+ #
63
+ # http://eli.thegreenplace.net/2010/01/28/generating-random-sentences-from-a-context-free-grammar
64
+ class RandomDampened < Base
65
+
66
+ # Initializes the derivator. The damping factor is a number betweeon 0.0 and 1.0. In
67
+ # general, the smaller the number the shorter the senetence generated by the derivator.
68
+ # If the number is close to 1.0, it is possible that you will encounter stack errors!
69
+ def initialize(grammar, damping = 0.25)
70
+ if (damping <= 0.0) || (damping >= 1.0)
71
+ raise ArgumentError, "The damping factor must be greater than 0.0 and less than 1.0."
72
+ end
73
+ build_production_proxies(grammar, damping)
74
+ super(grammar)
75
+ end
76
+
77
+ def build_production_proxies(grammar, damping)
78
+ @production_proxies = {}
79
+ grammar.nonterminals.each do |nonterminal|
80
+ @production_proxies[nonterminal] = DampenedProbabilityProductionChoiceProxy.new(nonterminal, damping)
81
+ end
82
+ end
83
+ private :build_production_proxies
84
+
85
+ # Generates a sentence.
86
+ def sentence
87
+ substitute_nonterminal(@grammar.start, @production_proxies, 0)
88
+ end
89
+
90
+ def substitute_nonterminal(nonterminal, production_proxies, depth)
91
+
92
+ # production_proxies_copy = {}
93
+ # production_proxies_copy = production_proxies.each do |key, value|
94
+ # production_proxies_copy[key] = value.dup
95
+ # end
96
+ #
97
+ production_proxies_copy = production_proxies.map do |value|
98
+ value.dup
99
+ end
100
+
101
+ production_proxies_copy[nonterminal].production.flat_map do |term|
102
+ if (term.class == Nonterminal)
103
+ substitute_nonterminal(term, production_proxies_copy, depth + 1)
104
+ else
105
+ term
106
+ end
107
+ end
108
+
109
+ end
110
+ private :substitute_nonterminal
111
+
112
+ end
113
+
114
+ end
115
+ end