RubyGems - fathom - Versions diffs - 0.2.3 → 0.3.0 - Mend

fathom 0.2.3 → 0.3.0

Files changed (19) hide show

data/README.md +72 -247
data/TODO.md +18 -18
data/VERSION +1 -1
data/lib/fathom.rb +4 -0
data/lib/fathom/import.rb +8 -7
data/lib/fathom/import/csv_import.rb +1 -3
data/lib/fathom/import/import_node.rb +17 -0
data/lib/fathom/import/yaml_import.rb +1 -3
data/lib/fathom/mc_node.rb +69 -0
data/lib/fathom/monte_carlo_set.rb +23 -2
data/lib/fathom/node.rb +21 -3
data/spec/fathom/import/csv_import_spec.rb +9 -5
data/spec/fathom/import/import_node_spec.rb +10 -0
data/spec/fathom/import/yaml_import_spec.rb +10 -8
data/spec/fathom/import_spec.rb +10 -0
data/spec/fathom/mc_node_spec.rb +66 -0
data/spec/fathom/monte_carlo_set_spec.rb +41 -0
data/spec/fathom/node_spec.rb +26 -0
metadata +10 -4

data/README.md CHANGED

@@ -1,292 +1,112 @@
 Fathom
 ======
-Introduction
-------------
+Welcome to Fathom.  Fathom is a library for building decision support tools.  Fathom is the kind of tool you'd want to use when you:
-This is a library for decision support.  It is useful for recording various types of information, and then combining it in useful ways.  As of right now, it's not very useful, but I'm actively working on it again.
+* want to build a reliable knowledge base for any kind of information
+* need to simplify a complex problem
+* have more data than a spreadsheet likes to use
-The ideas for this gem are coming from a lot of places:
+Stability Note
+==============
-* Judea Pearl's work on causal graphs and belief networks.  See [Causality](http://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X/ref=sr_1_1?s=books&ie=UTF8&qid=1288840948&sr=1-1) and [Probabilistic Reasoning in Intelligent Systems](http://www.amazon.com/Probabilistic-Reasoning-Intelligent-Systems-Plausible/dp/1558604790/ref=ntt_at_ep_dpi_2)
-* Douglas Hubbard's ideas on decision support.  See [How to Measure Anything](http://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/0470539399/ref=sr_1_1?ie=UTF8&qid=1288840870&sr=8-1)
-* Ben Klemens' ideas on data analysis.  See [Modeling with Data](http://modelingwithdata.org/about_the_book.html)
+Please note that Fathom is not ready for production at this time.  I happen to be using it in production for a few modeling projects, but it is undergoing some major architectural changes that won't be stabilized until I have release version 0.4, which will contain the Belief Networks, the Knowledge Base API, and a more complete cleanup of the code.  One major change that's already in the system is the simulations and imports are all building a knowledge base as a graph.  This makes it natural to use the output of a simulation as the input of a different model.
-To build useful decision support environments, there are three things that need to be in place:
-* Data needs to be gathered or referenced
-* Models need to be developed for the data
-* Data and models need to be presented in context
+Inspiration for the Project
+---------------------------
-Setting up the data and models starts with a decoupled Ruby library.  I'll give it a web service API so that a server could be setup for simple systems.  The decoupled library can also be used as consumers on a message queue system for larger installations.
+The ideas for this gem are coming from a lot of places:
-Keeping the data and models in context is more of a user interface question, which I'll build in another library.  I'm considering hosting that solution myself and just making it available publicly.  We'll see after all the core ideas are gathered.
+* Judea Pearl's work on causal graphs and belief networks.  See [Causality](http://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X/ref=sr_1_1?s=books&ie=UTF8&qid=1288840948&sr=1-1) and [Probabilistic Reasoning in Intelligent Systems](http://www.amazon.com/Probabilistic-Reasoning-Intelligent-Systems-Plausible/dp/1558604790/ref=ntt_at_ep_dpi_2)
+* Douglas Hubbard's ideas on decision support.  See [How to Measure Anything](http://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/0470539399/ref=sr_1_1?ie=UTF8&qid=1288840870&sr=8-1)
+* Ben Klemens' ideas on data analysis.  See [Modeling with Data](http://modelingwithdata.org/about_the_book.html)
-Fathom Basics
--------------
+The goals of this project are:
-Enrico Fermi [said](http://www.lucidcafe.com/library/95sep/fermi.html):
-    There are two possible outcomes: if the result confirms the hypothesis, then you've made a measurement.
-    If the result is contrary to the hypothesis, then you've made a discovery.
+* Build a decoupled library with Ruby and the GSL
+* Make it easy to gather information of all types
+* Add tools to analyze the integration of knowledge
-To put together a hypothesis, we gather what we know about our problem:
+Decoupled Library with Ruby and the GSL
+--------------------------------------------------------
-* What is the decision we are making?
-* What are the consequences of the decision?
-* What do we know now?
-* How do we order the data we have?
-* How can we express this in ranges?
+I use Ruby because it's very flexible and fast to write.  I use the GSL because it's very fast, robust, and well-tested.  I decouple the library so that it's easy to use the parts that are worthwhile for your problem and ignore the rest.
-If we have a lot of clarity about what we're after, it's easier to gather data and build worthwhile models.  It's probably a good idea to start with PlausibleRange:
+Most of the library is about coordinating data nodes.  So, an educated guess or the data that came out of a spreadsheet will sit in a particular kind of node.  The system then allows us to relate this information.  For example, a node that defines the income for a company would relate to the nodes that define the revenue and expenses of the company.  All of this is done with some simple Ruby libraries to make things easy to coordinate.
-    q1_sales = PlausibleRange.new(:min => 10, :max => 20, :hard_lower_bound => 0, :name => "First Quarter Sales")
-    q1_prices = PlausibleRange.new(:min => 10_000, :max => 12_000, :name => "First Quarter Prices")
-    q1_sales_commissions = PlausibleRange.new(:min => 0.2, :max => 0.2, :name => "Sales Commission Rate")
-We can combine these ranges in a ValueDescription:
+The statistics that we run are usually run on a GSL::Vector using various random number generators coming from the GSL.  This is really the heart of what the GSL does for Fathom.  A lot of the other tools may be used by some plugins for Fathom, and anyone using their computer to do open source data analysis probably has a good installation of the GSL in it.
-    q1_gross_margins = ValueDescription.new(q1_sales, q1_prices, q1_sales_commissions) do |random_sample|
-      revenue = (random_sample.first_quarter_sales * random_sample.first_quarter_prices)
-      commissions_paid = random_sample.sales_commission_rate * revenue
-      gross_margins = revenue - commissions_paid
-      {:revenue => revenue, :commissions_paid => commissions_paid, :gross_margins => gross_margins}
-    end
-A ValueDescription can take the ranges and combine them with a block of code.  Here, we sample sales, prices and commission rates to get revenues, commissions paid, and gross margins.  We can then use Monte Carlo methods to model our system:
+There are two bindings for Ruby and the GSL that I know of: [Ruby/GSL](http://rb-gsl.rubyforge.org/) and [ruby-gsl](https://github.com/codahale/ruby-gsl).  Ruby/GSL is still maintained and has the syntax that we're using here.
-    sales_model = MonteCarloSet.new(q1_gross_margins)
-    sales_model.process(10_000)
-    sales_model.revenue.mean
-    sales_model.revenue.sd
-    sales_model.gross_margins.mean
-    sales_model.gross_margins.sd
-Here, we are able to run 10,000 random samples to get an idea of how our system interacts.  Notice how the methods get generated in the different objects:
+The decoupling is based on making every file as stand-alone as possible.  What that means is we use autoload quite a bit in the library.  Each file references the main lib/fathom.rb file, so that it can autoload any dependencies it may have.  Also, most files have a test to see if they are being run from the command line at the bottom of the file.  The goal is to make it fairly easy to run a single class in a message queue system, say, or as part of another script to get data into our knowledge base.  In this sense, we're trying to follow some of the design principles of Unix: simple scripts that do one thing well and can combine with other scripts.
-* The ValueDescription converts the name to a lower case, underscore-joined name (E.g. Sales Commission Rate becomes sales_commission_rate).
-* The MonteCarloSet uses the keys from the return value in the ValueDescription block to generate method names
+Gathering Information
+-----------------------------
-At this point, everything is using a normal Gaussian distribution.  Since Fathom uses the GNU Scientific Library, there are many other distributions we will incorporate into our library.
+Decision support is about making rational choices from the information available.  This means that we do several things:
-If you start with data instead of data ranges, you can use a DataNode instead:
+* Make it fairly easy to load spreadsheet data in a CSV format
+* Add support for listing assumptions in a YAML file
+* Make it possible to link to RDF data and all of the external context that can be useful
+* Create links to richer data, such as ERP installations, databases, and web crawlers
-    q1_sales = DataNode.new(:name => "First Quarter Sales", :values => [10,11,15,9])
-A DataNode can also be used in a ValueDescription.
+The data we gather is organized as a bunch of nodes in a graph.  We also try to create ranges or probability distributions for everything, so that things become comparable and we can add things together fairly well.  We also always know explicitly what our uncertainty is, so that we're not misleading ourselves.  Importing conflicting data should be manageable if you're careful to document what you're loading.
-Sometimes it's easier to load data from other sources, such as a spreadsheet:
+Also, we keep track of meta data.  So, specific decisions are described in the knowledge base, as are references to other material, or other ways to describe what we're doing.  In this way, we should be able to work on convincing the appropriate people (employees, executives, review boards) that we've made sound decisions and that we should commit resources to execute the plan.
-    sales_data = CSVImport.new(:content => "path/to/sales_data.csv")
-    sales_data.import
-This reads the sales_data file and imports a DataNode for each column.  The spreadsheet is expected to look something like this:
+Tools for Analysis and Integration
+--------------------------------------------
-    First Quarter Sales,First Quarter Prices
-    10,12000
-    11,11500
-    15,10000
-    9,12000
+The integration tools in Fathom are:
-The nodes are then generated and stored in the knowledge base.  Right now, this is just an in-memory hash stored in Fathom.knowledge_base
+* Monte Carlo Simulations
+* Agent Based Models
+* Belief Networks
+* Causal Graphs
-You can also use YAML files to import data.  Given the following YAML data:
+These tools allow us to combine information in our knowledge base and run simulations or update beliefs to be able to see the larger perspective.  These tools also offer insight into areas where more information should have the most return on investment.  So, given a fairly limited amount of information, we can draw conclusions about what's going on, and pinpoint areas where we can refine our models and get more certain results.
-    CO2 Emissions:
-      min: 1_000_000
-      max: 1_000_000_000
+We are also adding Apophenia to the library.  We'll be able to build data analysis tools outside of Fathom and then bring their results into our knowledge base and integrate it with the bigger picture.  In this way, we'll be able to do all sorts of statistical analysis, machine learning, data mining, and other information-generating tasks without making Fathom too complicated.
-    CO2 Readings:
-      - 10
-      - 20
-      - 30
+Apophenia is the pragmatic analysts dream.  It's a C-based library that uses the GSL to do its analysis very quickly.  It uses SQLite for data management, which makes set operations in memory optimal.  It stores the models in consistent ways, so that it won't be hard to use this information inside of Fathom.
-You can load the nodes with:
+There are also other kinds of external libraries whose analysis could be brought into Fathom through the import tools.
-    yaml_nodes = YAMLImport.new('path/to/yaml/file')
-    yaml_nodes.import
-This will create a PlausibleRange for CO2 Emissions and a DataNode for CO2 Readings.
+The ultimate integration tool for Fathom is the web service.  We are exposing access to all of the tools through a RESTful, JSON interface so that Fathom can be part of any sort of application.  We also expect to publish basic HTML support for these same functions so that users can input and read their knowledge base without too much trouble.
-To use imported data in a ValueDescription, just reference this knowledge base:
+Further Information
+-------------------
-    ValueDescription.new(Fathom.knowledge_base['First Quarter Sales'], Fathom.knowledge_base['First Quarter Prices']) do
-      ...
-    end
+* You can use our [Wiki](https://github.com/davidrichards/fathom/wiki) to get code examples and see how things are coming along.
+* You can go to the [TODO](https://github.com/davidrichards/fathom/blob/master/TODO.md) page to see how current development is mapped out.
+* You can go to the [Fleet Ventures Blog](http://fleetventures.com) to get more in-depth tutorials and commentary about how to use these types of tools in business as well as a broader perspective of other technologies we use to solve these kinds of problems.
-Serial Agent Based Modeling
+Dependencies and Extensions
 ---------------------------
-I have added some basic support for Agent Based Modeling (ABM).  Right now, this only supports serial simulations.  I will be adding an Agent Cluster, which will allow us to run large simulations asynchronously using EventMachine.  Until then, here's a really simple example of how to do things.
-First, let's create a couple agents, a Cola and a Consumer:
-    class Cola < Agent
-      property :sweetness
-      property :number_sold
-      def on_purchase(consumer)
-        self.number_sold += 1
-        log_purchase
-      end
-      def on_tick(simulation)
-        self.sweetness = suggest_sweetness
-      end
-      def inspect
-        "Cola: sweetness: #{self.sweetness}, sales: #{self.number_sold}"
-      end
-      protected
-        # This is where the fun is as well.  This is an admittedly poor suggestion engine.
-        def suggest_sweetness
-          case purchases.length
-          when *(0..10).to_a
-            self.node_for_sweetness.rand
-          when *(10..50).to_a
-            (self.node_for_sweetness.rand * 0.4) +
-            (average_purchase_sweetness * 0.6)
-          when *(50..250).to_a
-            (self.node_for_sweetness.rand * 0.2) +
-            (average_purchase_sweetness * 0.8)
-          else
-            (self.node_for_sweetness.rand * 0.05) +
-            (average_purchase_sweetness * 0.95)
-          end
-        end
-        def average_purchase_sweetness
-          purchases.inject(0.0) {|s, e| s += e}  / purchases.length
-        end
-        def log_purchase
-          purchases << sweetness
-        end
-        def purchases
-          @purchases ||= []
-        end
-    end
-    class Consumer < Agent
-      property :sweetness_preference
-      attr_reader :simulation
-      def on_tick(simulation)
-        @simulation ||= simulation
-        purchase_cola
-      end
-      def inspect
-        "Consumer: preferred sweetness: #{self.sweetness_preference}"
-      end
-      protected
-        def agents_using_purchase
-          @agents_using_purchase ||= simulation.agents_using_purchase
-        end
-        # This is where all the fun happens.
-        def purchase_cola
-          if rand < 0.1
-            agents_using_purchase.rand.on_purchase(self)
-          else
-            distances = agents_using_purchase.map {|agent| [agent, (self.sweetness_preference - agent.sweetness).abs] }
-            sorted_distances = distances.sort {|a, b| a.last <=> b.last }
-            purchased = sorted_distances.first.first
-            purchased.on_purchase(self)
-          end
-        end
-    end
-Agents need to do just a few things:
-* define their properties
-* define which events they listen to
-* define the behavior we're after for each event
-Properties can be whatever you're after.  Usually, these are seeded with some knowledge that we're working on in the knowledge base.  Declaring a property gives us a getter and a setter for that property, as well as access to the seed objects we use when setting up the agent.
-Events are setup by defining a method starting with on_.  A consumer responds to on_tick, and the cola responds to on_tick and on_purchase.  We setup events with this convention so that it's a little easier to coordinate the traffic amongst the agents and between the agents and the simulation.  When we start using EventMachine for agent clusters, it will be more important to have this interface explicitly defined like this so that things don't get confused.
-The underlying behavior is where we can have a lot of fun.  We can start adopting reinforcement learning techniques, or mimic real-world interactions.  For this example, I had the consumer purchase some cola at every tick.  Right now, it optimizes for the cola that's nearest its preference for sweetness.  You may imagine how fun this would get to introduce different types of consumers, or start mimicking a satisficing algorithm (allow the consumers to make a choice that's good enough, rather than optimal).  We could start adding budgets, ages, and proximity to the cola.  Once the behaviors and properties are setup, models can be iterated over extensively until the system dynamics are thoroughly explored, or even some prognostic value begins to emerge from the experiments.
-To show the whole example, let me give you some configuration data I stored in a YAML file:
-    :american_consumer_sweetness_preference:
-      hard_lower_bound: 0
-      hard_upper_bound: 1
-      min: 0.2
-      max: 0.3
-      name: American Consumer Sweetness Preference
-    :cola_sweetness_range:
-      hard_lower_bound: 0
-      hard_upper_bound: 1
-Also, here is the actual simulation:
-    require 'rubygems'
-    require 'fathom'
-    require 'cola'
-    require 'consumer'
-    YAMLImport.import(File.expand_path('nodes.yml'))
-    @rb_cola = Cola.new(:sweetness => Fathom.kb[:cola_sweetness_range], :number_sold => 0)
-    @ruby_cola = Cola.new(:sweetness => Fathom.kb[:cola_sweetness_range], :number_sold => 0)
-    @american_consumer = Consumer.new(
-      :sweetness_preference => Fathom.kb[:american_consumer_sweetness_preference],
-      :budget => Fathom.kb[:american_cola_budget]
-    )
-    @simulation = TickSimulation.new(@rb_cola, @ruby_cola, @american_consumer)
-    @simulation.process(1_000)
-    puts @american_consumer.inspect, @rb_cola.inspect, @ruby_cola.inspect
-The output from this experiment looks like this:
-    demo_abm : ruby sim.rb
-    Consumer: preferred sweetness: 0.258095065252885
-    Cola: sweetness: 0.362263199218971, sales: 626
-    Cola: sweetness: 0.377573124603715, sales: 374
-You can see that our single consumer wanted sweetness rated around 0.25, and ended up purchasing more soda that ended up looking like 0.36.  With better goal-seeking behavior, the agents could actually optimize to the consumer's preferences.  With some verification of the seed nodes against market data, the simulations could look more and more like the real world.
-I've written up an article on our company blog to give a better background to Agent Based Models, which can be [found here](http://fleetventures.com/2010/11/07/agent-based-modeling/).
-Future Development
-------------------
+This project relies on the [GNU Scientific Library](http://www.gnu.org/software/gsl/) and the [ruby/gsl](http://rb-gsl.rubyforge.org/) bindings for the GSL.
-This code is certainly not production ready.  There are many things I'll want to add just to have basic Monte Carlo methods up to snuff:
-* More distributions to choose from
-* More import methods (RDF, relational databases, no SQL data stores)
-* A persisted knowledge base
-* Configuration on the knowledge base and databases
-* Better visualization with plotutils support and possibly other graphics support
-* Project organization: decision descriptions, owners, sharing
-* Measurement values: use Shannon's entropy and some value calculations to point out which measurements have the highest potential ROI
-* EventMachine to drive agent clusters, as well as possibly other parts of the system
-On a bigger level, I still haven't implemented other major ideas:
-* System dynamics
-* Belief updating in Causal Graphs
-* Fathom as a Web service
-Dependencies
-------------
-This project relies on the GNU Scientific Library and the ruby/gsl bindings for the GSL.  It has only minimal extensions to external libraries:
+Fathom has only minimal extensions on external libraries:
 * Array responds to rand (so [1,2,3].rand returns a random value from that array)
 * OpenStruct exposes it's underlying table, keys, and values
 * FasterCSV has a :strip header converter now
+* String has the constantize method added to it from the ActiveSupport library
+In the future, more optional dependencies will be introduced for parts of the library:
+* EventMachine is one that I'm sure will be added.
+* RDF.rb and related gems will be used for some of the KnowledgeBase
+* SQLite will be available for some set operations
+* One of the key/value data stores will be used for the KnowledgeBase (Riak, CouchDB, MongDO, Redis, or similar)
-In the future, more dependencies will be introduced for parts of the library: EventMachine is one that I'm sure will be added.  The goal of this project is to allow a reasonable number of dependencies to make the project performant and useful, but without making it a headache to setup or use with other projects.
+It should be easy to avoid the parts of the library that use dependencies that you don't want to have.  The goal for dependencies is:
+* To use the best tool available for the job
+* Make it easy to avoid those parts of the library that use those dependencies
+For example, the in-memory version of the Knowledge Base will remain available for quick and dirty analysis.
 Note on Patches/Pull Requests
 -----------------------------
@@ -303,6 +123,11 @@ Note on Patches/Pull Requests
 Copyright
 ---------
+* The GSL is released under the [GNU General Public License](http://www.gnu.org/copyleft/gpl.html).
+* Ruby/GSL is released under the [GNU General Public License](http://www.gnu.org/copyleft/gpl.html).
+* FasterCSV is released under the [GPL Version 2](http://www.gnu.org/licenses/old-licenses/gpl-2.0.html) license.
+* Ruby is released under [this license](http://www.ruby-lang.org/en/LICENSE.txt).
 Copyright (c) 2010 David Richards
 Permission is hereby granted, free of charge, to any person obtaining

data/TODO.md CHANGED

@@ -1,7 +1,7 @@
 TODO
 ====
-Reorganizing
+Reorganizing (0.2.5)
 ------------
 I've just made some big refactoring steps regarding the organization of the system and the distributions.  To make sure we're there:
@@ -10,7 +10,7 @@ I've just made some big refactoring steps regarding the organization of the syst
 * Finish the discrete ideas, adding size to the node and automatically using that for stats
 * Create the idea of a labeled, multinomial node
 * Add SQLite3 for in-memory set operations for a labeled, multinomial node
-* Add and remove finder methods on nodes for their parents and children
+* Make sure we are not defining methods on all objects in a class when they should only be set for a single object.
 Also, the general organization of the system could be broken down better:
@@ -24,16 +24,7 @@ Also, the general organization of the system could be broken down better:
 * apophenia
 * simulation
-MonteCarlo
-----------
-This needs to get a few new features:
-* combine with ValueDescription into one node
-* generate nodes for the return values
-* consider a more general simulation framework, in case it needs to be extended, or to use some of the tools that will be added to the ABM stuff
-Belief Networks
+Belief Networks (0.3)
 ---------------
 To get these delivered, I need to revisit the edge logic, to make sure it's easy to extend each edge with an object.
@@ -44,13 +35,13 @@ Then:
 * Network propagation
 * Network testing (polytree)
-Agent Based Modeling
+Agent Based Modeling (0.3.1)
 --------------------
 * Add parameter-passing standards for callbacks
-* Add EventMachine and async capabilities (Inncluding the cluster idea)
+* Add EventMachine and async capabilities (Including the cluster idea)
-Knowledge Base
+Knowledge Base (0.4)
 --------------
 Probably around here I'll be able to start looking at a persistent knowledge base.  I am not sure which way I'll go, but things I'm considering:
@@ -65,7 +56,7 @@ One of the key features needs to be search:
 * possibly a Xapian search index for full-text searching
 * still need a standard query language, depending on what I choose above
-Apophenia
+Apophenia (0.5)
 ---------
 I'd like to get Apophenia integrated so that any data model generated there could be combined with the work done here.  That means that most of the "hard" data crunching is using some fairly fast tools: C, SQLite3 in memory, Apophenia and the GSL.
@@ -83,7 +74,7 @@ You would go to Apophenia to:
 Fathom could feed the information to Apophenia data models.  Given a fairly robust knowledge base, this makes a lot of sense.
-Import
+Import (0.6)
 ------
 * More robust support for CSV and YAML
@@ -92,7 +83,16 @@ Import
 * Apophenia
 * Web Crawlers
-Publication
+Value of Information (0.7)
+--------------------
+* Shannon's Entropy
+* Integration of nodes with the decisions they support
+* Economic value of the decision
+* Economic value of the measurement
+* Guidance for areas where reduced uncertainty would have the highest ROI
+Publication (1.0)
 -----------
 Turning Fathom into a better tool for publishing knowledge, there are a few major parts to add:

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 0.2.3
1	+ 0.3.0

data/lib/fathom.rb CHANGED

@@ -6,6 +6,8 @@
 $:.unshift(File.dirname(__FILE__))
 $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), 'fathom')))
+require 'rubygems'
 require "gsl"
 require 'options_hash'
@@ -26,11 +28,13 @@ module Fathom
   autoload :ValueAggregator, "value_aggregator"
   autoload :ValueMultiplier, "value_multiplier"
   autoload :MonteCarloSet, "monte_carlo_set"
+  autoload :MCNode, "mc_node"
   autoload :CausalGraph, "causal_graph"
   autoload :DataNode, "data_node"
   autoload :KnowledgeBase, "knowledge_base"
   autoload :Import, "import"
+  autoload :ImportNode, "import/import_node"
   autoload :YAMLImport, 'import/yaml_import'
   autoload :CSVImport, 'import/csv_import'
   autoload :RDFImport, 'import/rdf_import'

data/lib/fathom/import.rb CHANGED

@@ -42,23 +42,22 @@ class Fathom::Import
     end
   end
-  attr_reader :content, :options
+  attr_reader :content, :options, :import_node
   def initialize(opts={})
     @options = OptionsHash.new(opts)
     @content = @options[:content]
+    @import_node = ImportNode.new(opts)
   end
   def import
-    results = []
     import_methods.each do |method|
       klass, initialization_data = self.send(method.to_sym)
       initialization_data.each do |values|
-        node = extract_nodes(klass, values)
-        results << node if node
+        extract_nodes(klass, values)
       end
     end
-    results
+    self.import_node
   end
   protected
@@ -67,8 +66,10 @@ class Fathom::Import
     def extract_nodes(klass, values)
       begin
         node = klass.new(values)
-        Fathom.knowledge_base[node.name] = node
-        node
+        if node
+          self.import_node.add_child(node)
+          Fathom.knowledge_base[node.name] = node
+        end
       rescue
         nil
       end

data/lib/fathom/import/csv_import.rb CHANGED

@@ -55,7 +55,5 @@ module Fathom
 end
 if __FILE__ == $0
-  include Fathom
-  # TODO: Is there anything you want to do to run this file on its own?
-  # CSV.new
+  Fathom::CSVImport.import(:content => ARGV.first)
 end

data/lib/fathom/import/import_node.rb ADDED

@@ -0,0 +1,17 @@
+require File.expand_path(File.join(File.dirname(__FILE__), '..', '..', 'fathom'))
+class Fathom::ImportNode < Node
+  attr_reader :imported_at
+  def initialize(opts={})
+    super(opts)
+    @imported_at = Time.now
+  end
+end
+if __FILE__ == $0
+  include Fathom
+  # TODO: Is there anything you want to do to run this file on its own?
+  # ImportNode.new
+end

data/lib/fathom/import/yaml_import.rb CHANGED

@@ -49,7 +49,5 @@ class Fathom::YAMLImport < Import
 end
 if __FILE__ == $0
-  include Fathom
-  # TODO: Is there anything you want to do to run this file on its own?
-  # YAMLImport.new
+  Fathom::YAMLImport.import(:content => ARGV.first)
 end

data/lib/fathom/mc_node.rb ADDED

@@ -0,0 +1,69 @@
+require File.expand_path(File.join(File.dirname(__FILE__), '..', 'fathom'))
+class Fathom::MCNode < Node
+  attr_reader :value_description, :samples_taken
+  def initialize(opts={}, &block)
+    super(opts)
+    @value_description = opts[:value_description]
+    @value_description ||= block if block_given?
+    raise ArgumentError, "Must provide a value_description from either a parameter or by passing in a block" unless
+      @value_description
+  end
+  def process(n=10_000)
+    @samples_taken, @samples = n, {}
+    @samples_taken.times do
+      result = value_description.call(self)
+      store(result)
+    end
+    assert_nodes
+  end
+  def reset!
+    @samples_taken, @samples = nil, {}
+    @samples_asserted = false
+  end
+  def fields
+    self.children.map {|c| c.name_sym}.compact
+  end
+  protected
+    def store(result)
+      result = assert_result_hash(result)
+      assert_samples(result)
+      result.each do |key, value|
+        @samples[key.to_sym] << value
+      end
+    end
+    def assert_samples(result)
+      return true if @samples_asserted
+      result.each do |k, v|
+        @samples[k.to_sym] ||= []
+      end
+      @samples_asserted = true
+    end
+    def assert_result_hash(result)
+      result.is_a?(Hash) ? result : {:result => result}
+    end
+    # Assumes the same value description for all samples taken
+    def assert_nodes
+      @samples.each do |key, values|
+        node = DataNode.new(:name => key, :values => values)
+        add_child(node)
+#        self.class.define_summary_method(key)
+      end
+    end
+end
+if __FILE__ == $0
+  include Fathom
+  # TODO: Is there anything you want to do to run this file on its own?
+  # MCNode.new
+end

data/lib/fathom/monte_carlo_set.rb CHANGED

@@ -48,21 +48,42 @@ class Fathom::MonteCarloSet
     end
   end
+  def print_summary
+    print_hash(self.summary)
+  end
   protected
+    def print_hash(hash, indent=0)
+      hash.each do |key, value|
+        if value.is_a?(Hash)
+          puts "#{' ' * indent}#{key} => {"
+          print_hash(value, indent + 2)
+          puts "#{' ' * indent}}"
+        else
+          puts "#{' ' * indent}#{key} => #{value}"
+        end
+      end
+    end
     def summarize_field(field)
       raise "No fields are defined.  Have you processed this model yet?" if fields.empty?
       raise ArgumentError, "#{field} is not a field in this set." unless fields.include?(field)
       vector = self.send(field)
       return vector unless vector.is_a?(GSL::Vector)
+      lb = lower_bound(:mean => vector.mean, :sd => vector.sd)
+      lb = vector.min if vector.min > lb
+      ub = upper_bound(:mean => vector.mean, :sd => vector.sd)
+      ub = vector.max if vector.max < ub
       {
         :coefficient_of_variation => (vector.sd / vector.mean),
         :max => vector.max,
         :mean => vector.mean,
         :min => vector.min,
         :sd => vector.sd,
-        :upper_bound => upper_bound(:mean => vector.mean, :sd => vector.sd),
-        :lower_bound => lower_bound(:mean => vector.mean, :sd => vector.sd)
+        :upper_bound => ub,
+        :lower_bound => lb
       }
     end

data/lib/fathom/node.rb CHANGED

@@ -1,6 +1,6 @@
 require File.expand_path(File.join(File.dirname(__FILE__), '..', 'fathom'))
 class Fathom::Node
   attr_reader :name, :distribution, :description, :values
   def initialize(opts={})
@@ -22,13 +22,17 @@ class Fathom::Node
   def add_parent(parent)
     self.parents << parent
+    self.add_accessor_for_node(parent)
     parent.register_child(self)
   end
   def register_child(child)
     raise "Cannot register a child if this node is not a parent already.  Use add_parent to the other node or add_child to this node." unless
       child.parents.include?(self)
-    children << child unless children.include?(child)
+    unless children.include?(child)
+      self.add_accessor_for_node(child)
+      children << child
+    end
     true
   end
@@ -38,18 +42,32 @@ class Fathom::Node
   def add_child(child)
     self.children << child
+    self.add_accessor_for_node(child)
     child.register_parent(self)
   end
   def register_parent(parent)
     raise "Cannot register a parent if this node is not a child already.  Use add_child to the other node or add_parent to this node." unless
       parent.children.include?(self)
-    parents << parent unless parents.include?(parent)
+    unless parents.include?(parent)
+      self.add_accessor_for_node(parent)
+      parents << parent
+    end
     true
   end
   protected
+    def add_accessor_for_node(node)
+      return false unless node.is_a?(Node) and node.name_sym
+      return false if self.respond_to?(node.name_sym)
+      (class << self; self; end).module_eval do
+        define_method node.name_sym do
+          node
+        end
+      end
+    end
     def assert_links(opts)
       found = opts[:parents]
       found ||= opts[:parent]

data/spec/fathom/import/csv_import_spec.rb CHANGED

@@ -22,15 +22,19 @@ describe CSVImport do
     lambda{CSVImport.new(@opts)}.should_not raise_error
   end
+  it "should return the ImportNode as the result" do
+    @result.should be_a(ImportNode)
+  end
   it "should create as many data nodes as there are columns" do
-    @result.size.should eql(3)
-    @result.each {|dn| dn.should be_a(DataNode)}
+    @result.children.size.should eql(3)
+    @result.children.each {|dn| dn.should be_a(DataNode)}
   end
   it "should import the values from each column into each data node" do
-    @result[0].values.should eql([1,4,7])
-    @result[1].values.should eql([2,5,8])
-    @result[2].values.should eql([3,6,9])
+    @result.this.values.should eql([1,4,7])
+    @result.and.values.should eql([2,5,8])
+    @result.that.values.should eql([3,6,9])
   end
   it "should store the imported values in the knowledge base" do

data/spec/fathom/import/import_node_spec.rb ADDED

@@ -0,0 +1,10 @@
+require File.expand_path(File.dirname(__FILE__) + '/../../spec_helper')
+include Fathom
+describe ImportNode do
+  it "should record the time of the import" do
+    i = ImportNode.new
+    i.imported_at.should be_close(Time.now, 0.01)
+  end
+end

data/spec/fathom/import/yaml_import_spec.rb CHANGED

@@ -17,24 +17,26 @@ describe YAMLImport do
     lambda{YAMLImport.new(@opts)}.should_not raise_error
   end
+  it "should create an ImportNode as a return value" do
+    @result.should be_an(ImportNode)
+  end
   it "should create PlausibleRange nodes for any hashes with at least a min and max key in it" do
-    @result.find {|r| r.name == "CO2 Emissions"}.should_not be_nil
+    @result.co2_emissions.should_not be_nil
   end
   it "should not create a PlausibleRange for entries missing min and max" do
-    @result.find {|r| r.name == "Invalid Hash"}.should be_nil
+    @result.should_not respond_to(:invalid_hash)
   end
   it "should be able to create a PlausibleRange with more complete information" do
-    more_complete_range = @result.find {|r| r.name == "More Complete Range"}
-    more_complete_range.ci.should eql(0.6)
-    more_complete_range.description.should eql('Some good description')
+    @result.more_complete_range.ci.should eql(0.6)
+    @result.more_complete_range.description.should eql('Some good description')
   end
   it "should create DataNodes for entries that have an array of information" do
-    data_node = @result.find {|r| r.name == 'CO2 Readings'}
-    data_node.should be_a(DataNode)
-    data_node.values.should eql([10,20,30])
+    @result.co2_readings.should be_a(DataNode)
+    @result.co2_readings.values.should eql([10,20,30])
   end
   it "should store the imported values in the knowledge base" do

data/spec/fathom/import_spec.rb CHANGED

@@ -23,4 +23,14 @@ describe Import do
     Import.should be_respond_to(:import)
   end
+  it "should create an import node to attach its imports to in the knowledge base" do
+    @i.import_node.should be_a(ImportNode)
+  end
+  it "should pass its options to the import node for that node to record the parts it is interested in." do
+    i = Import.new(:name => "New Import", :content => @content, :description => "This gets passed along too.")
+    i.import_node.name.should eql("New Import")
+    i.import_node.description.should eql("This gets passed along too.")
+  end
 end

data/spec/fathom/mc_node_spec.rb ADDED

@@ -0,0 +1,66 @@
+require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')
+include Fathom
+describe MCNode do
+  before(:all) do
+    @fields = [:value_result]
+  end
+  before do
+    @vd = lambda{|n| {:value_result => 1}}
+    @mcn = MCNode.new(:value_description => @vd)
+  end
+  it "should be a type of Node" do
+    MCNode.ancestors.should be_include(Fathom::Node)
+  end
+  it "should use a value_description from the command-line arguments" do
+    mcn = MCNode.new(:value_description => @vd)
+    mcn.value_description.should eql(@vd)
+  end
+  it "should be able to take a block instead of a named lamda for the value description" do
+    mcn = MCNode.new {|n| {:value_result => 1}}
+    mcn.value_description.should be_a(Proc)
+  end
+  it "should require a value_description from either a parameter or a block passed in" do
+    lambda{MCNode.new}.should raise_error(/value_description/)
+  end
+  it "should process with the default number of runs at 10,000", :slow => true do
+    lambda{@mcn.process}.should_not raise_error
+    @mcn.samples_taken.should eql(10_000)
+  end
+  it "should call the value_description block each time it is processed" do
+    @vd.should_receive(:call).exactly(3).times.with(@mcn).and_return({:value_result => 1})
+    @mcn.process(3)
+  end
+  it "should define children nodes for all keys in the result set" do
+    @mcn.process(1)
+    @mcn.value_result.should be_a(Node)
+    @mcn.value_result.values.should eql([1])
+    @mcn.value_result.vector.should be_a(GSL::Vector)
+  end
+  it "should be resetable" do
+    @mcn.process(1)
+    @mcn.reset!
+    lambda{@mcn.process(1)}.should_not raise_error
+  end
+  it "should expose the fields from the samples" do
+    @mcn.process(1)
+    sort_array_of_symbols(@mcn.fields).should eql(@fields)
+  end
+end
+def sort_array_of_symbols(array)
+  array.map {|e| e.to_s}.sort.map {|e| e.to_sym}
+end

data/spec/fathom/monte_carlo_set_spec.rb CHANGED

@@ -101,6 +101,47 @@ describe MonteCarloSet do
     lambda{mcs.process(2)}.should_not raise_error
     lambda{mcs.summary}.should_not raise_error
   end
+  it "should set the lower bound in the summary to be no more than the minimum (when hard_lower_bound truncates the curve)" do
+    @q1_sales = PlausibleRange.new(:min => 0, :max => 2, :hard_lower_bound => 0, :name => "First Quarter Sales")
+    @q1_prices = PlausibleRange.new(:min => 1, :max => 1, :name => "First Quarter Prices")
+    @q1_sales_commissions = PlausibleRange.new(:min => 0.2, :max => 0.2, :name => "Sales Commission Rate")
+    @q1_gross_margins = ValueDescription.new(@q1_sales, @q1_prices, @q1_sales_commissions) do |random_sample|
+      revenue = (random_sample.first_quarter_sales * random_sample.first_quarter_prices)
+      commissions_paid = random_sample.sales_commission_rate * revenue
+      gross_margins = revenue - commissions_paid
+      {:revenue => revenue, :commissions_paid => commissions_paid, :gross_margins => gross_margins}
+    end
+    @mcs = MonteCarloSet.new(@q1_gross_margins)
+    @mcs.process(5)
+    # This is an environment where the lower bound would usually be below the minimum.
+    # So, the minimum adheres to the hard_lower_bound constraints in the plausible range (tested elsewhere)
+    # and now we're expecting the lower bound here to reflect an actual minimum, or a 5% confidence interval,
+    # whichever is higher.
+    (@mcs.summary[:revenue][:lower_bound] >= @mcs.revenue.min).should be_true
+  end
+  it "should set the upper bound in the summary to be no more than the minimum (when hard_upper_bound truncates the curve)" do
+    @q1_sales = PlausibleRange.new(:min => 0, :max => 2, :hard_upper_bound => 2, :name => "First Quarter Sales")
+    @q1_prices = PlausibleRange.new(:min => 1, :max => 1, :name => "First Quarter Prices")
+    @q1_sales_commissions = PlausibleRange.new(:min => 0.2, :max => 0.2, :name => "Sales Commission Rate")
+    @q1_gross_margins = ValueDescription.new(@q1_sales, @q1_prices, @q1_sales_commissions) do |random_sample|
+      revenue = (random_sample.first_quarter_sales * random_sample.first_quarter_prices)
+      commissions_paid = random_sample.sales_commission_rate * revenue
+      gross_margins = revenue - commissions_paid
+      {:revenue => revenue, :commissions_paid => commissions_paid, :gross_margins => gross_margins}
+    end
+    @mcs = MonteCarloSet.new(@q1_gross_margins)
+    @mcs.process(5)
+    # This is an environment where the upper bound would usually be above the maximum
+    # So, the maximum adheres to the hard_upper_bound constraints in the plausible range (tested elsewhere)
+    # and now we're expecting the upper bound here to reflect an actual maximum, or a 95% confidence interval,
+    # whichever is higher.
+    (@mcs.summary[:revenue][:upper_bound] <= @mcs.revenue.max).should be_true
+  end
 end
 def sort_array_of_symbols(array)

data/spec/fathom/node_spec.rb CHANGED

@@ -125,5 +125,31 @@ describe Node do
     n2 = Node.new
     lambda{n2.register_child(n1)}.should raise_error
   end
+  it "should define an accessor method for the child node when added" do
+    n1 = Node.new(:name => 'n1')
+    n2 = Node.new :name => 'n2', :child => n1
+    n2.n1.should eql(n1)
+    n1.should_not respond_to(:n1)
+  end
+  it "should define an accessor method for the parent node when added" do
+    n1 = Node.new(:name => 'n1')
+    n2 = Node.new :parent => n1
+    n2.n1.should eql(n1)
+    n1.should_not respond_to(:n1)
+  end
+  it "should have defined an accessor method for the added child to the parent" do
+    n1 = Node.new(:name => 'n1')
+    n2 = Node.new :name => 'n2', :child => n1
+    n1.n2.should eql(n2)
+  end
+  it "should have defined an accessor method for the added parent to the child" do
+    n1 = Node.new(:name => 'n1')
+    n2 = Node.new :name => 'n2', :parent => n1
+    n1.n2.should eql(n2)
+  end
 end

metadata CHANGED

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: fathom
 version: !ruby/object:Gem::Version
-  hash: 17
+  hash: 19
   prerelease: false
   segments:
   - 0
-  - 2
   - 3
-  version: 0.2.3
+  - 0
+  version: 0.3.0
 platform: ruby
 authors:
 - David
@@ -15,7 +15,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-11-10 00:00:00 -07:00
+date: 2010-11-15 00:00:00 -07:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency
@@ -79,9 +79,11 @@ files:
 - lib/fathom/ext/string.rb
 - lib/fathom/import.rb
 - lib/fathom/import/csv_import.rb
+- lib/fathom/import/import_node.rb
 - lib/fathom/import/yaml_import.rb
 - lib/fathom/inverter.rb
 - lib/fathom/knowledge_base.rb
+- lib/fathom/mc_node.rb
 - lib/fathom/monte_carlo_set.rb
 - lib/fathom/node.rb
 - lib/fathom/numeric_methods.rb
@@ -102,9 +104,11 @@ files:
 - spec/fathom/distributions/uniform_spec.rb
 - spec/fathom/enforced_name_spec.rb
 - spec/fathom/import/csv_import_spec.rb
+- spec/fathom/import/import_node_spec.rb
 - spec/fathom/import/yaml_import_spec.rb
 - spec/fathom/import_spec.rb
 - spec/fathom/knowledge_base_spec.rb
+- spec/fathom/mc_node_spec.rb
 - spec/fathom/monte_carlo_set_spec.rb
 - spec/fathom/node_spec.rb
 - spec/fathom/numeric_methods_spec.rb
@@ -161,9 +165,11 @@ test_files:
 - spec/fathom/distributions/uniform_spec.rb
 - spec/fathom/enforced_name_spec.rb
 - spec/fathom/import/csv_import_spec.rb
+- spec/fathom/import/import_node_spec.rb
 - spec/fathom/import/yaml_import_spec.rb
 - spec/fathom/import_spec.rb
 - spec/fathom/knowledge_base_spec.rb
+- spec/fathom/mc_node_spec.rb
 - spec/fathom/monte_carlo_set_spec.rb
 - spec/fathom/node_spec.rb
 - spec/fathom/numeric_methods_spec.rb