fathom 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (47) hide show
  1. data/.bundle/config +2 -0
  2. data/.document +5 -0
  3. data/.gitignore +5 -0
  4. data/.rspec +1 -0
  5. data/Gemfile +5 -0
  6. data/Gemfile.lock +30 -0
  7. data/LICENSE +20 -0
  8. data/README.md +176 -0
  9. data/Rakefile +50 -0
  10. data/VERSION +1 -0
  11. data/autotest/discover.rb +1 -0
  12. data/lib/fathom.rb +68 -0
  13. data/lib/fathom/archive/conditional_probability_matrix.rb +116 -0
  14. data/lib/fathom/archive/n2.rb +198 -0
  15. data/lib/fathom/archive/n3.rb +119 -0
  16. data/lib/fathom/archive/node.rb +74 -0
  17. data/lib/fathom/archive/noodle.rb +136 -0
  18. data/lib/fathom/archive/scratch.rb +45 -0
  19. data/lib/fathom/basic_node.rb +8 -0
  20. data/lib/fathom/causal_graph.rb +12 -0
  21. data/lib/fathom/combined_plausibilities.rb +12 -0
  22. data/lib/fathom/concept.rb +83 -0
  23. data/lib/fathom/data_node.rb +51 -0
  24. data/lib/fathom/import.rb +68 -0
  25. data/lib/fathom/import/csv_import.rb +60 -0
  26. data/lib/fathom/import/yaml_import.rb +53 -0
  27. data/lib/fathom/inverter.rb +21 -0
  28. data/lib/fathom/knowledge_base.rb +23 -0
  29. data/lib/fathom/monte_carlo_set.rb +76 -0
  30. data/lib/fathom/node_utilities.rb +8 -0
  31. data/lib/fathom/plausible_range.rb +82 -0
  32. data/lib/fathom/value_aggregator.rb +11 -0
  33. data/lib/fathom/value_description.rb +79 -0
  34. data/lib/fathom/value_multiplier.rb +18 -0
  35. data/lib/options_hash.rb +186 -0
  36. data/spec/fathom/data_node_spec.rb +61 -0
  37. data/spec/fathom/import/csv_import_spec.rb +36 -0
  38. data/spec/fathom/import/yaml_import_spec.rb +40 -0
  39. data/spec/fathom/import_spec.rb +22 -0
  40. data/spec/fathom/knowledge_base_spec.rb +16 -0
  41. data/spec/fathom/monte_carlo_set_spec.rb +58 -0
  42. data/spec/fathom/plausible_range_spec.rb +130 -0
  43. data/spec/fathom/value_description_spec.rb +70 -0
  44. data/spec/fathom_spec.rb +8 -0
  45. data/spec/spec_helper.rb +13 -0
  46. data/spec/support/demo.yml +17 -0
  47. metadata +135 -0
@@ -0,0 +1,2 @@
1
+ ---
2
+ BUNDLE_DISABLE_SHARED_GEMS: "1"
@@ -0,0 +1,5 @@
1
+ README.rdoc
2
+ lib/**/*.rb
3
+ bin/*
4
+ features/**/*.feature
5
+ LICENSE
@@ -0,0 +1,5 @@
1
+ *.sw?
2
+ .DS_Store
3
+ coverage
4
+ rdoc
5
+ pkg
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'http://rubygems.org'
2
+
3
+ gem "rspec"
4
+ gem "ruby-debug"
5
+ gem "fastercsv"
@@ -0,0 +1,30 @@
1
+ GEM
2
+ remote: http://rubygems.org/
3
+ specs:
4
+ columnize (0.3.1)
5
+ diff-lcs (1.1.2)
6
+ fastercsv (1.5.3)
7
+ linecache (0.43)
8
+ rspec (2.0.1)
9
+ rspec-core (~> 2.0.1)
10
+ rspec-expectations (~> 2.0.1)
11
+ rspec-mocks (~> 2.0.1)
12
+ rspec-core (2.0.1)
13
+ rspec-expectations (2.0.1)
14
+ diff-lcs (>= 1.1.2)
15
+ rspec-mocks (2.0.1)
16
+ rspec-core (~> 2.0.1)
17
+ rspec-expectations (~> 2.0.1)
18
+ ruby-debug (0.10.3)
19
+ columnize (>= 0.1)
20
+ ruby-debug-base (~> 0.10.3.0)
21
+ ruby-debug-base (0.10.3)
22
+ linecache (>= 0.3)
23
+
24
+ PLATFORMS
25
+ ruby
26
+
27
+ DEPENDENCIES
28
+ fastercsv
29
+ rspec
30
+ ruby-debug
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2010 David
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,176 @@
1
+ Fathom
2
+ ------
3
+
4
+ Introduction
5
+ ============
6
+
7
+ This is a library for decision support. It is useful for recording various types of information, and then combining it in useful ways. As of right now, it's not very useful, but I'm actively working on it again.
8
+
9
+ The ideas for this gem are coming from a lot of places:
10
+
11
+ * Judea Pearl's work on causal graphs and belief networks. See [Causality](http://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X/ref=sr_1_1?s=books&ie=UTF8&qid=1288840948&sr=1-1) and [Probabilistic Reasoning in Intelligent Systems](http://www.amazon.com/Probabilistic-Reasoning-Intelligent-Systems-Plausible/dp/1558604790/ref=ntt_at_ep_dpi_2)
12
+ * Douglas Hubbard's ideas on decision support. See [How to Measure Anything](http://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/0470539399/ref=sr_1_1?ie=UTF8&qid=1288840870&sr=8-1)
13
+ * Ben Klemens' ideas on data analysis. See [Modeling with Data](http://modelingwithdata.org/about_the_book.html)
14
+
15
+ To build useful decision support environments, there are three things that need to be in place:
16
+
17
+ * Data needs to be gathered or referenced
18
+ * Models need to be developed for the data
19
+ * Data and models need to be presented in context
20
+
21
+ Setting up the data and models starts with a decoupled Ruby library. I'll give it a web service API so that a server could be setup for simple systems. The decoupled library can also be used as consumers on a message queue system for larger installations.
22
+
23
+ Keeping the data and models in context is more of a user interface question, which I'll build in another library. I'm considering hosting that solution myself and just making it available publicly. We'll see after all the core ideas are gathered.
24
+
25
+ Usage
26
+ =====
27
+
28
+ Enrico Fermi [said](http://www.lucidcafe.com/library/95sep/fermi.html):
29
+ There are two possible outcomes: if the result confirms the hypothesis, then you've made a measurement. If the result is contrary to the hypothesis, then you've made a discovery.
30
+
31
+ To put together a hypothesis, we gather what we know about our problem:
32
+
33
+ * What is the decision we are making?
34
+ * What are the consequences of the decision?
35
+ * What do we know now?
36
+ * How do we order the data we have?
37
+ * How can we express this in ranges?
38
+
39
+ If we have a lot of clarity about what we're after, it's easier to gather data and build worthwhile models. It's probably a good idea to start with PlausibleRange:
40
+
41
+ q1_sales = PlausibleRange.new(:min => 10, :max => 20, :hard_lower_bound => 0, :name => "First Quarter Sales")
42
+ q1_prices = PlausibleRange.new(:min => 10_000, :max => 12_000, :name => "First Quarter Prices")
43
+ q1_sales_commissions = PlausibleRange.new(:min => 0.2, :max => 0.2, :name => "Sales Commission Rate")
44
+
45
+ We can combine these ranges in a ValueDescription:
46
+
47
+ q1_gross_margins = ValueDescription.new(q1_sales, q1_prices, q1_sales_commissions) do |random_sample|
48
+ revenue = (random_sample.first_quarter_sales * random_sample.first_quarter_prices)
49
+ commissions_paid = random_sample.sales_commission_rate * revenue
50
+ gross_margins = revenue - commissions_paid
51
+ {:revenue => revenue, :commissions_paid => commissions_paid, :gross_margins => gross_margins}
52
+ end
53
+
54
+ A ValueDescription can take the ranges and combine them with a block of code. Here, we sample sales, prices and commission rates to get revenues, commissions paid, and gross margins. We can then use Monte Carlo methods to model our system:
55
+
56
+ sales_model = MonteCarloSet.new(q1_gross_margins)
57
+ sales_model.process(10_000)
58
+ sales_model.revenue.mean
59
+ sales_model.revenue.sd
60
+ sales_model.gross_margins.mean
61
+ sales_model.gross_margins.sd
62
+
63
+ Here, we are able to run 10,000 random samples to get an idea of how our system interacts. Notice how the methods get generated in the different objects:
64
+
65
+ * The ValueDescription converts the name to a lower case, underscore-joined name (E.g. Sales Commission Rate becomes sales_commission_rate).
66
+ * The MonteCarloSet uses the keys from the return value in the ValueDescription block to generate method names
67
+
68
+ At this point, everything is using a normal Gaussian distribution. Since Fathom uses the GNU Scientific Library, there are many other distributions we will incorporate into our library.
69
+
70
+ If you start with data instead of data ranges, you can use a DataNode instead:
71
+
72
+ q1_sales = DataNode.new(:name => "First Quarter Sales", :values => [10,11,15,9])
73
+
74
+ A DataNode can also be used in a ValueDescription.
75
+
76
+ Sometimes it's easier to load data from other sources, such as a spreadsheet:
77
+
78
+ sales_data = CSVImport.new(:content => "path/to/sales_data.csv")
79
+ sales_data.import
80
+
81
+ This reads the sales_data file and imports a DataNode for each column. The spreadsheet is expected to look something like this:
82
+
83
+ First Quarter Sales,First Quarter Prices
84
+ 10,12000
85
+ 11,11500
86
+ 15,10000
87
+ 9,12000
88
+
89
+ The nodes are then generated and stored in the knowledge base. Right now, this is just an in-memory hash stored in Fathom.knowledge_base
90
+
91
+ You can also use YAML files to import data. Given the following YAML data:
92
+
93
+ CO2 Emissions:
94
+ min: 1_000_000
95
+ max: 1_000_000_000
96
+
97
+ CO2 Readings:
98
+ - 10
99
+ - 20
100
+ - 30
101
+
102
+ You can load the nodes with:
103
+
104
+ yaml_nodes = YAMLImport.new('path/to/yaml/file')
105
+ yaml_nodes.import
106
+
107
+ This will create a PlausibleRange for CO2 Emissions and a DataNode for CO2 Readings.
108
+
109
+ To use imported data in a ValueDescription, just reference this knowledge base:
110
+
111
+ ValueDescription.new(Fathom.knowledge_base['First Quarter Sales'], Fathom.knowledge_base['First Quarter Prices']) do
112
+ ...
113
+ end
114
+
115
+ This code is certainly not production ready. There are many things I'll want to add just to have basic Monte Carlo methods up to snuff:
116
+
117
+ * More distributions to choose from
118
+ * More import methods (RDF, relational databases, no SQL data stores)
119
+ * A persisted knowledge base
120
+ * Configuration on the knowledge base and databases
121
+ * Better visualization with plotutils support and possibly other graphics support
122
+ * Project organization: decision descriptions, owners, sharing
123
+ * Measurement values: use Shannon's entropy and some value calculations to point out which measurements have the highest potential ROI
124
+
125
+ On a bigger level, I still haven't implemented other major ideas:
126
+
127
+ * Agent-based modeling
128
+ * System dynamics
129
+ * Belief updating in Causal Graphs
130
+ * Fathom as a Web service
131
+
132
+ Documentation TODO:
133
+
134
+ * Document using this library from the command line
135
+ * Document these classes as RabbitMQ consumers
136
+
137
+ Dependencies
138
+ ============
139
+
140
+ This project relies on the GNU Scientific Library and the ruby/gsl bindings for the GSL.
141
+
142
+ Note on Patches/Pull Requests
143
+ =============================
144
+
145
+ * Fork the project.
146
+ * Make your feature addition or bug fix.
147
+ * Add tests for it. This is important so I don't break it in a
148
+ future version unintentionally.
149
+ * Commit, do not mess with rakefile, version, or history.
150
+ (if you want to have your own version, that is fine but
151
+ bump version in a commit by itself I can ignore when I pull)
152
+ * Send me a pull request. Bonus points for topic branches.
153
+
154
+ Copyright
155
+ =========
156
+
157
+ Copyright (c) 2010 David Richards
158
+
159
+ Permission is hereby granted, free of charge, to any person obtaining
160
+ a copy of this software and associated documentation files (the
161
+ "Software"), to deal in the Software without restriction, including
162
+ without limitation the rights to use, copy, modify, merge, publish,
163
+ distribute, sublicense, and/or sell copies of the Software, and to
164
+ permit persons to whom the Software is furnished to do so, subject to
165
+ the following conditions:
166
+
167
+ The above copyright notice and this permission notice shall be
168
+ included in all copies or substantial portions of the Software.
169
+
170
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
171
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
172
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
173
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
174
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
175
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
176
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,50 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ Jeweler::Tasks.new do |gem|
7
+ gem.name = "fathom"
8
+ gem.summary = %Q{Decision Support in Ruby}
9
+ gem.description = %Q{Collecting some decision support tools in a decoupled Ruby library.}
10
+ gem.email = "davidlamontrichards@gmail.com"
11
+ gem.homepage = "http://github.com/davidrichards/fathom"
12
+ gem.authors = ["David"]
13
+ gem.add_development_dependency "rspec"
14
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
15
+ end
16
+ Jeweler::GemcutterTasks.new
17
+ rescue LoadError
18
+ puts "Jeweler (or a dependency) not available. Install it with: sudo gem install jeweler"
19
+ end
20
+
21
+ require "rspec/core/rake_task"
22
+ RSpec::Core::RakeTask.new(:core) do |spec|
23
+ spec.pattern = 'spec/fathom/*_spec.rb'
24
+ # spec.rspec_opts = ['--backtrace']
25
+ end
26
+
27
+ #
28
+ # Spec::Rake::SpecTask.new(:rcov) do |spec|
29
+ # spec.libs << 'lib' << 'spec'
30
+ # spec.pattern = 'spec/**/*_spec.rb'
31
+ # spec.rcov = true
32
+ # end
33
+ #
34
+ # task :spec => :check_dependencies
35
+
36
+ task :default => :spec
37
+
38
+ require 'rake/rdoctask'
39
+ Rake::RDocTask.new do |rdoc|
40
+ if File.exist?('VERSION')
41
+ version = File.read('VERSION')
42
+ else
43
+ version = ""
44
+ end
45
+
46
+ rdoc.rdoc_dir = 'rdoc'
47
+ rdoc.title = "fathom #{version}"
48
+ rdoc.rdoc_files.include('README*')
49
+ rdoc.rdoc_files.include('lib/**/*.rb')
50
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1 @@
1
+ Autotest.add_discovery { "rspec2" }
@@ -0,0 +1,68 @@
1
+ # ================
2
+ # = Dependencies =
3
+ # ================
4
+
5
+ # Make decoupling easier with an informed LoadPath
6
+ $:.unshift(File.dirname(__FILE__))
7
+ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), 'fathom')))
8
+
9
+ require "gsl"
10
+ require 'ostruct'
11
+ require 'options_hash'
12
+
13
+ # Fix a few bugs in OpenStruct
14
+ class OpenStruct
15
+ def table
16
+ @table
17
+ end
18
+
19
+ def values
20
+ @table.values
21
+ end
22
+
23
+ def keys
24
+ @table.keys
25
+ end
26
+ end
27
+
28
+ module Fathom
29
+ lib = File.expand_path(File.dirname(__FILE__))
30
+ $LOAD_PATH.unshift(lib)
31
+
32
+ # Autoload classes and modules so that we only load as much of the library as we're using.
33
+ # This allows us to have a fairly large library without taking up a lot of memory unless we need it.
34
+ autoload :Inverter, "inverter"
35
+ autoload :BasicNode, "basic_node"
36
+ autoload :PlausibleRange, "plausible_range"
37
+ autoload :R, "plausible_range"
38
+ # autoload :LowerBound, "lower_bound"
39
+ # autoload :UpperBound, "upper_bound"
40
+ # autoload :Distribution, "distribution"
41
+ # autoload :DependencyGraph, "dependency_graph"
42
+ autoload :ValueDescription, "value_description"
43
+ autoload :ValueAggregator, "value_aggregator"
44
+ autoload :ValueMultiplier, "value_multiplier"
45
+ autoload :MonteCarloSet, "monte_carlo_set"
46
+ autoload :CombinedPlausibilities, "combined_plausibilities"
47
+ autoload :CausalGraph, "causal_graph"
48
+ autoload :DataNode, "data_node"
49
+ autoload :KnowledgeBase, "knowledge_base"
50
+
51
+ autoload :Import, "import"
52
+ autoload :YAMLImport, 'import/yaml_import'
53
+ autoload :CSVImport, 'import/csv_import'
54
+ autoload :RDFImport, 'import/rdf_import'
55
+ autoload :SQLiteImport, 'import/sqlite_import'
56
+
57
+ autoload :NodeUtilities, 'node_utilities'
58
+
59
+ def knowledge_base
60
+ @knowledge_base ||= KnowledgeBase.new
61
+ end
62
+ end
63
+
64
+ # Temporary
65
+ include Fathom
66
+ def r
67
+ @r ||= R.new(:min => 1, :max => 10)
68
+ end
@@ -0,0 +1,116 @@
1
+ require 'rubygems'
2
+ require 'gsl'
3
+
4
+ include GSL
5
+
6
+ class NodeAccessor
7
+
8
+ attr_reader :cpm
9
+
10
+ def initialize(cpm)
11
+ @cpm = cpm
12
+ end
13
+
14
+ def is(*labels)
15
+ ChildAccessor.new(cpm, *labels)
16
+ end
17
+
18
+ def is_not(*labels)
19
+ ChildAccessor.new(cpm, *(cpm.child.labels - labels))
20
+ end
21
+ end
22
+
23
+ class ChildAccessor
24
+
25
+ attr_reader :cpm, :labels
26
+ def initialize(cpm, *labels)
27
+ @cpm, @labels = cpm, labels
28
+ end
29
+
30
+ def given(parent_name)
31
+ ParentAccessor.new(cpm, labels)
32
+ end
33
+
34
+ end
35
+
36
+ class ParentAccessor
37
+
38
+ attr_reader :cpm, :node, :child_labels, :child_indices
39
+ def initialize(cpm, child_labels)
40
+ @cpm = cpm
41
+ @child_labels = child_labels
42
+ @node = cpm.parent
43
+ @child_indices = child_labels.map {|label| @cpm.child.labels.index(label)}
44
+ end
45
+
46
+ def is(*labels)
47
+ indices = labels.map {|label| get_index(label) }
48
+ sum_probabilities(indices)
49
+ end
50
+
51
+ def is_not(*labels)
52
+ not_indices = labels.map {|label| get_index(label) }
53
+ indices = (0..node.labels.size).to_a - not_indices
54
+ sum_probabilities(indices)
55
+ end
56
+
57
+ protected
58
+
59
+ # TODO: Not right...
60
+ def sum_probabilities(indices)
61
+ first_child = child_indices.first
62
+ cpm.matrix[indices.first, first_child]
63
+ # indices.inject(0.0) do |s, i|
64
+ # s += cpm.matrix[i, first_child]
65
+ # end
66
+ end
67
+
68
+ def get_index(label)
69
+ node.labels.index(label)
70
+ end
71
+
72
+ end
73
+
74
+
75
+ class ConditionalProbabilityMatrix
76
+
77
+ class << self
78
+ def define_node_accessor(node, cpm)
79
+ define_method(node.name.to_sym) do
80
+ NodeAccessor.new(cpm)
81
+ end
82
+ end
83
+ end
84
+
85
+ attr_reader :parent, :child, :matrix
86
+
87
+ def initialize(parent, child)
88
+ @parent, @child = parent, child
89
+ @matrix = @parent.probabilities.col * @child.probabilities
90
+ assert_name_access
91
+ end
92
+
93
+ def probability(opts={})
94
+ child_label = opts[:child]
95
+ parent_label = opts[:parent]
96
+ raise ArgumentError, "Must provide a child and parent label. E.g., probability(:child => true, :parent => false)" unless child_label and parent_label
97
+ child_label_index = get_index(child, child_label)
98
+ parent_label_index = get_index(parent, parent_label)
99
+ self.matrix[parent_label_index, child_label_index]
100
+ end
101
+ alias :p :probability
102
+
103
+ def inspect
104
+ "ConditionalProbabilityMatrix: #{matrix.to_a.inspect}"
105
+ end
106
+
107
+ protected
108
+
109
+ def assert_name_access
110
+ ConditionalProbabilityMatrix.define_node_accessor(child, self)
111
+ end
112
+
113
+ def get_index(node, label)
114
+ node.labels.index(label)
115
+ end
116
+ end