fathom 0.2.2 → 0.2.3

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile CHANGED
@@ -3,3 +3,4 @@ source 'http://rubygems.org'
3
3
  gem "rspec"
4
4
  gem "ruby-debug"
5
5
  gem "fastercsv"
6
+ gem 'uuid'
@@ -5,6 +5,7 @@ GEM
5
5
  diff-lcs (1.1.2)
6
6
  fastercsv (1.5.3)
7
7
  linecache (0.43)
8
+ macaddr (1.0.0)
8
9
  rspec (2.0.1)
9
10
  rspec-core (~> 2.0.1)
10
11
  rspec-expectations (~> 2.0.1)
@@ -20,6 +21,8 @@ GEM
20
21
  ruby-debug-base (~> 0.10.3.0)
21
22
  ruby-debug-base (0.10.3)
22
23
  linecache (>= 0.3)
24
+ uuid (2.3.1)
25
+ macaddr (~> 1.0)
23
26
 
24
27
  PLATFORMS
25
28
  ruby
@@ -28,3 +31,4 @@ DEPENDENCIES
28
31
  fastercsv
29
32
  rspec
30
33
  ruby-debug
34
+ uuid
data/TODO.md ADDED
@@ -0,0 +1,140 @@
1
+ TODO
2
+ ====
3
+
4
+ Reorganizing
5
+ ------------
6
+
7
+ I've just made some big refactoring steps regarding the organization of the system and the distributions. To make sure we're there:
8
+
9
+ * Go back and test the 4 distributions I decided on
10
+ * Finish the discrete ideas, adding size to the node and automatically using that for stats
11
+ * Create the idea of a labeled, multinomial node
12
+ * Add SQLite3 for in-memory set operations for a labeled, multinomial node
13
+ * Add and remove finder methods on nodes for their parents and children
14
+
15
+ Also, the general organization of the system could be broken down better:
16
+
17
+ * agent
18
+ * distributions
19
+ * node
20
+ * import
21
+ * causal_graph
22
+ * belief_network
23
+ * knowledge_base
24
+ * apophenia
25
+ * simulation
26
+
27
+ MonteCarlo
28
+ ----------
29
+
30
+ This needs to get a few new features:
31
+
32
+ * combine with ValueDescription into one node
33
+ * generate nodes for the return values
34
+ * consider a more general simulation framework, in case it needs to be extended, or to use some of the tools that will be added to the ABM stuff
35
+
36
+ Belief Networks
37
+ ---------------
38
+
39
+ To get these delivered, I need to revisit the edge logic, to make sure it's easy to extend each edge with an object.
40
+
41
+ Then:
42
+
43
+ * CPM brought back from the archive
44
+ * Network propagation
45
+ * Network testing (polytree)
46
+
47
+ Agent Based Modeling
48
+ --------------------
49
+
50
+ * Add parameter-passing standards for callbacks
51
+ * Add EventMachine and async capabilities (Inncluding the cluster idea)
52
+
53
+ Knowledge Base
54
+ --------------
55
+
56
+ Probably around here I'll be able to start looking at a persistent knowledge base. I am not sure which way I'll go, but things I'm considering:
57
+
58
+ * RDF markup
59
+ * Riak/Redis/Mongo/Couch backend
60
+ * Backend adapters
61
+ * tabular data in an RDBMS (think Apophenia)
62
+
63
+ One of the key features needs to be search:
64
+
65
+ * possibly a Xapian search index for full-text searching
66
+ * still need a standard query language, depending on what I choose above
67
+
68
+ Apophenia
69
+ ---------
70
+
71
+ I'd like to get Apophenia integrated so that any data model generated there could be combined with the work done here. That means that most of the "hard" data crunching is using some fairly fast tools: C, SQLite3 in memory, Apophenia and the GSL.
72
+
73
+ That would mean that you generally come to Fathom to:
74
+
75
+ * coordinate the elements of a decision or information discovery project
76
+ * run simulations across their collective knowledge
77
+ * maintain consistent information between information nodes
78
+
79
+ You would go to Apophenia to:
80
+
81
+ * build data models from statistical methods
82
+ * generate new sets from grouping, sorting, merging, and dealing with multiple imputation
83
+
84
+ Fathom could feed the information to Apophenia data models. Given a fairly robust knowledge base, this makes a lot of sense.
85
+
86
+ Import
87
+ ------
88
+
89
+ * More robust support for CSV and YAML
90
+ * OpenERP
91
+ * RDF
92
+ * Apophenia
93
+ * Web Crawlers
94
+
95
+ Publication
96
+ -----------
97
+
98
+ Turning Fathom into a better tool for publishing knowledge, there are a few major parts to add:
99
+
100
+ Support for Reports:
101
+
102
+ * Template-based system (possibly MVC as part of the Web Service)
103
+ * Latex-enabled
104
+ * Graphs and PDFs through Prawn
105
+ * PDFs, CSVs through Ruport
106
+
107
+ Web Service:
108
+
109
+ * Basic CRUD for every node type
110
+ * Traversal of the graph with good search semantics
111
+ * Authentication and Authorization
112
+ * HTML-based interface with strong input capabilities (mind map js, auto-build forms, dynamic forms, etc.)
113
+ * Survey system
114
+
115
+ Meta Data:
116
+
117
+ * ontological support for systems approach to research (7 nodes / article)
118
+ * ontological support for references (auto generate citations)
119
+ * cleaner approach to the decision framework
120
+
121
+ Causal Graphs
122
+ -------------
123
+
124
+ * Add the "DO" operator to a belief network
125
+ * Add the tests for causality using the DO operator
126
+
127
+ This stuff gets into things I haven't finished reading yet, but it would be very interesting/important to finish that work and bring it into Fathom. This is all Judea Pearl stuff.
128
+
129
+ Information Service
130
+ -------------------
131
+
132
+ Using Seaside, Fathom (through the Web Service), and OpenERP, there are several products I could create using this framework:
133
+
134
+ * Decision support could become custom integrated to a hosted OpenERP instance, for example.
135
+ * General domain information could culled and organized and verified so that others could tie in and build only their parts of their decision support framework
136
+ * Think tank support would be a very interesting place to work
137
+
138
+ All of this could also be coupled with consulting and hosting services.
139
+
140
+
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.2
1
+ 0.2.3
@@ -11,6 +11,7 @@ require 'options_hash'
11
11
 
12
12
  require 'ext/open_struct'
13
13
  require 'ext/array'
14
+ require 'ext/string'
14
15
 
15
16
  module Fathom
16
17
  lib = File.expand_path(File.dirname(__FILE__))
@@ -19,18 +20,12 @@ module Fathom
19
20
  # Autoload classes and modules so that we only load as much of the library as we're using.
20
21
  # This allows us to have a fairly large library without taking up a lot of memory unless we need it.
21
22
  autoload :Inverter, "inverter"
22
- autoload :BasicNode, "basic_node"
23
+ autoload :Node, "node"
23
24
  autoload :PlausibleRange, "plausible_range"
24
- autoload :R, "plausible_range"
25
- # autoload :LowerBound, "lower_bound"
26
- # autoload :UpperBound, "upper_bound"
27
- # autoload :Distribution, "distribution"
28
- # autoload :DependencyGraph, "dependency_graph"
29
25
  autoload :ValueDescription, "value_description"
30
26
  autoload :ValueAggregator, "value_aggregator"
31
27
  autoload :ValueMultiplier, "value_multiplier"
32
28
  autoload :MonteCarloSet, "monte_carlo_set"
33
- autoload :CombinedPlausibilities, "combined_plausibilities"
34
29
  autoload :CausalGraph, "causal_graph"
35
30
  autoload :DataNode, "data_node"
36
31
  autoload :KnowledgeBase, "knowledge_base"
@@ -41,8 +36,6 @@ module Fathom
41
36
  autoload :RDFImport, 'import/rdf_import'
42
37
  autoload :SQLiteImport, 'import/sqlite_import'
43
38
 
44
- autoload :NodeUtilities, 'node_utilities'
45
-
46
39
  autoload :Simulation, 'simulation'
47
40
  autoload :TickMethods, 'simulation/tick_methods'
48
41
  autoload :TickSimulation, 'simulation/tick_simulation'
@@ -50,6 +43,17 @@ module Fathom
50
43
  autoload :Agent, 'agent'
51
44
  autoload :Properties, 'agent/properties'
52
45
  autoload :AgentCluster, 'agent/agent_cluster'
46
+
47
+ autoload :NumericMethods, 'numeric_methods'
48
+ autoload :EnforcedName, 'enforced_name'
49
+
50
+ autoload :Distributions, 'distributions'
51
+ module Distributions
52
+ autoload :Gaussian, 'distributions/gaussian'
53
+ autoload :Uniform, 'distributions/uniform'
54
+ autoload :DiscreteGaussian, 'distributions/discrete_gaussian'
55
+ autoload :DiscreteUniform, 'distributions/discrete_uniform'
56
+ end
53
57
 
54
58
  def knowledge_base
55
59
  @knowledge_base ||= KnowledgeBase.new
@@ -59,6 +63,3 @@ end
59
63
 
60
64
  # Temporary
61
65
  include Fathom
62
- def r
63
- @r ||= R.new(:min => 1, :max => 10)
64
- end
@@ -4,44 +4,15 @@ require File.expand_path(File.join(File.dirname(__FILE__), '..', 'fathom'))
4
4
  A DataNode is a node generated from data itself. It stores the data and reveals some statistical
5
5
  measurements for the data. It expects an array or vector of values and generates a vector on demans.
6
6
  =end
7
- class Fathom::DataNode
7
+ class Fathom::DataNode < Node
8
8
 
9
- include NodeUtilities
10
-
11
- attr_reader :values, :name, :distribution, :confidence_interval
9
+ include NumericMethods
12
10
 
13
11
  def initialize(opts={})
14
- @values = opts[:values]
12
+ super(opts)
15
13
  raise ArgumentError, "Must provided values: DataNode.new(:values => [...])" unless self.values
16
- @name = opts[:name]
17
- @distribution = opts[:distribution]
18
- end
19
-
20
- alias :ci :confidence_interval
21
-
22
- def vector
23
- @vector ||= GSL::Vector.ary_to_gv(self.values)
24
- end
25
-
26
- def standard_deviation
27
- @standard_deviation ||= vector.sd
28
- end
29
- alias :sd :standard_deviation
30
- alias :std :standard_deviation
31
-
32
- def mean
33
- @mean ||= vector.mean
34
- end
35
-
36
- def rand
37
- rng.gaussian(std) + mean
38
14
  end
39
15
 
40
- protected
41
- def rng
42
- @rng ||= GSL::Rng.alloc(GSL::Rng::MT19937_1999, Kernel.rand(100_000))
43
- end
44
-
45
16
  end
46
17
 
47
18
  if __FILE__ == $0
@@ -0,0 +1,8 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', 'fathom'))
2
+ module Fathom
3
+ module Distributions
4
+ module SharedMethods
5
+ # TODO: Put helper methods here for sharing some of the distribution functionality
6
+ end
7
+ end
8
+ end
@@ -0,0 +1,44 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', '..', 'fathom'))
2
+ class Fathom::Distributions::DiscreteGaussian
3
+ extend Fathom::Distributions::SharedMethods
4
+ class << self
5
+ def rng
6
+ @rng ||= GSL::Rng.alloc(GSL::Rng::MT19937_1999, Kernel.rand(100_000))
7
+ end
8
+
9
+ def rand(sd)
10
+ (rng.gaussian(sd) / size).floor + 1
11
+ end
12
+
13
+ def inverse_cdf(opts={})
14
+ mean = opts[:mean]
15
+ sd = opts[:sd]
16
+ sd ||= opts[:std]
17
+ sd ||= opts[:standard_deviation]
18
+ lower = opts.fetch(:lower, true)
19
+ lower = false if opts[:upper]
20
+ confidence_interval = opts.fetch(:confidence_interval, 0.05)
21
+ value = lower ? GSL::Cdf.gaussian_Pinv(confidence_interval, sd) : GSL::Cdf.gaussian_Qinv(confidence_interval, sd)
22
+ value + mean
23
+ end
24
+ alias :lower_bound :inverse_cdf
25
+
26
+ def upper_bound(opts={})
27
+ inverse_cdf(opts.merge(:lower => false))
28
+ end
29
+
30
+ def interval_values(opts={})
31
+ confidence_interval = opts.fetch(:confidence_interval, 0.9)
32
+ bound = (1 - confidence_interval) / 2.0
33
+ [lower_bound(opts.merge(:confidence_interval => bound)), upper_bound(opts.merge(:confidence_interval => bound))]
34
+ end
35
+
36
+ # If only I had the background to explain what this is....
37
+ # I want to know how many standard deviations are expressed by the confidence interval
38
+ # I can then divide the range by this number to get the standard deviation
39
+ def standard_deviations_under(confidence_interval)
40
+ GSL::Cdf.gaussian_Qinv((1 - confidence_interval) / 2) * 2
41
+ end
42
+ end
43
+ end
44
+
@@ -0,0 +1,46 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', '..', 'fathom'))
2
+ class Fathom::Distributions::DiscreteUniform
3
+ extend Fathom::Distributions::SharedMethods
4
+ class << self
5
+ def rng
6
+ @rng ||= GSL::Rng.alloc(GSL::Rng::MT19937_1999, Kernel.rand(100_000))
7
+ end
8
+
9
+ def rand
10
+ (rng.ugaussian / size).floor + 1
11
+ end
12
+
13
+ def inverse_cdf(opts={})
14
+ mean = opts[:mean]
15
+ sd = opts[:sd]
16
+ sd ||= opts[:std]
17
+ sd ||= opts[:standard_deviation]
18
+ lower = opts.fetch(:lower, true)
19
+ lower = false if opts[:upper]
20
+ confidence_interval = opts.fetch(:confidence_interval, 0.05)
21
+ value = lower ? GSL::Cdf.ugaussian_Pinv(confidence_interval) : GSL::Cdf.ugaussian_Qinv(confidence_interval)
22
+ value + mean
23
+ end
24
+ alias :lower_bound :inverse_cdf
25
+
26
+ def upper_bound(opts={})
27
+ inverse_cdf(opts.merge(:lower => false))
28
+ end
29
+
30
+ def interval_values(opts={})
31
+ confidence_interval = opts.fetch(:confidence_interval, 0.9)
32
+ bound = (1 - confidence_interval) / 2.0
33
+ [lower_bound(opts.merge(:confidence_interval => bound)), upper_bound(opts.merge(:confidence_interval => bound))]
34
+ end
35
+
36
+ # If only I had the background to explain what this is....
37
+ # I want to know how many standard deviations are expressed by the confidence interval
38
+ # I can then divide the range by this number to get the standard deviation
39
+ def standard_deviations_under(confidence_interval)
40
+ GSL::Cdf.ugaussian_Qinv((1 - confidence_interval) / 2) * 2
41
+ end
42
+
43
+
44
+ end
45
+ end
46
+
@@ -0,0 +1,46 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', '..', 'fathom'))
2
+ class Fathom::Distributions::Gaussian
3
+ extend Fathom::Distributions::SharedMethods
4
+ class << self
5
+ def rng
6
+ @rng ||= GSL::Rng.alloc(GSL::Rng::MT19937_1999, Kernel.rand(100_000))
7
+ end
8
+
9
+ def rand(sd)
10
+ rng.gaussian(sd)
11
+ end
12
+
13
+ def inverse_cdf(opts={})
14
+ mean = opts[:mean]
15
+ sd = opts[:sd]
16
+ sd ||= opts[:std]
17
+ sd ||= opts[:standard_deviation]
18
+ lower = opts.fetch(:lower, true)
19
+ lower = false if opts[:upper]
20
+ confidence_interval = opts.fetch(:confidence_interval, 0.05)
21
+ value = lower ? GSL::Cdf.gaussian_Pinv(confidence_interval, sd) : GSL::Cdf.gaussian_Qinv(confidence_interval, sd)
22
+ value + mean
23
+ end
24
+ alias :lower_bound :inverse_cdf
25
+
26
+ def upper_bound(opts={})
27
+ inverse_cdf(opts.merge(:lower => false))
28
+ end
29
+
30
+ def interval_values(opts={})
31
+ confidence_interval = opts.fetch(:confidence_interval, 0.9)
32
+ bound = (1 - confidence_interval) / 2.0
33
+ [lower_bound(opts.merge(:confidence_interval => bound)), upper_bound(opts.merge(:confidence_interval => bound))]
34
+ end
35
+
36
+ # If only I had the background to explain what this is....
37
+ # I want to know how many standard deviations are expressed by the confidence interval
38
+ # I can then divide the range by this number to get the standard deviation
39
+ def standard_deviations_under(confidence_interval)
40
+ GSL::Cdf.gaussian_Qinv((1 - confidence_interval) / 2) * 2
41
+ end
42
+
43
+
44
+ end
45
+ end
46
+
@@ -0,0 +1,35 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', '..', 'fathom'))
2
+ class Fathom::Distributions::Uniform
3
+ extend Fathom::Distributions::SharedMethods
4
+ class << self
5
+ def rng
6
+ @rng ||= GSL::Rng.alloc(GSL::Rng::MT19937_1999, Kernel.rand(100_000))
7
+ end
8
+
9
+ def rand
10
+ rng.ugaussian
11
+ end
12
+
13
+ def inverse_cdf(opts={})
14
+ mean = opts[:mean]
15
+ lower = opts.fetch(:lower, true)
16
+ lower = false if opts[:upper]
17
+ confidence_interval = opts.fetch(:confidence_interval, 0.05)
18
+ value = lower ? GSL::Cdf.ugaussian_Pinv(confidence_interval) : GSL::Cdf.ugaussian_Qinv(confidence_interval)
19
+ value + mean
20
+ end
21
+ alias :lower_bound :inverse_cdf
22
+
23
+ def upper_bound(opts={})
24
+ inverse_cdf(opts.merge(:lower => false))
25
+ end
26
+
27
+ def interval_values(opts={})
28
+ confidence_interval = opts.fetch(:confidence_interval, 0.9)
29
+ bound = (1 - confidence_interval) / 2.0
30
+ [lower_bound(opts.merge(:confidence_interval => bound)), upper_bound(opts.merge(:confidence_interval => bound))]
31
+ end
32
+
33
+ end
34
+ end
35
+