forest 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ README.md
2
+ lib/**/*.rb
3
+ bin/*
4
+ features/**/*.feature
5
+ LICENSE
@@ -0,0 +1,24 @@
1
+ ## MAC OS
2
+ .DS_Store
3
+
4
+ ## TEXTMATE
5
+ *.tmproj
6
+ tmtags
7
+
8
+ ## EMACS
9
+ *~
10
+ \#*
11
+ .\#*
12
+
13
+ ## VIM
14
+ *.swp
15
+
16
+ ## PROJECT::GENERAL
17
+ coverage
18
+ rdoc
19
+ pkg
20
+
21
+ ## PROJECT::SPECIFIC
22
+
23
+
24
+ tmp/*
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "http://rubygems.org"
2
+
3
+ gem "rubytree"
@@ -0,0 +1,10 @@
1
+ GEM
2
+ remote: http://rubygems.org/
3
+ specs:
4
+ rubytree (0.8.1)
5
+
6
+ PLATFORMS
7
+ ruby
8
+
9
+ DEPENDENCIES
10
+ rubytree
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009 Makoto Inoue
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,118 @@
1
+ # Forest
2
+
3
+ ## Summary
4
+
5
+ A simple collection class to aggregate tree objects.
6
+ It takes [Adjacency List](http://sqlsummit.com/AdjacencyList.htm) as input, shows some stats and the top x biggest trees.
7
+
8
+ ## Why ?
9
+
10
+ Most database tables have some hierarchy related data (eg: who's your boss, who invited you to join, etc) without evan realising it in adjacency list format. Aggregating info from these trees are [a bit difficult if the depth of each trees are not even](http://dev.mysql.com/tech-resources/articles/hierarchical-data.html). Also, most trees only have 0 or 1 node attached, which need to be filtered out before rendering. That's why I created a simple wrapper to extract only trees (top x giggest trees) which are interesting enough to render.
11
+
12
+ ## Usage
13
+
14
+ forest [filename]
15
+
16
+ ## Input file format
17
+
18
+ [parent key], [child key], [any content(optional)]
19
+
20
+ - If last field is either nil, NULL, or "", then the row becomes a root tree
21
+ - Any middle columns are optional. If you specify them, it will be showed at tree diagram as optional information.
22
+ - Example sql to generate the input csv "select parent_id, id, name from foo"
23
+
24
+ ## Input file example
25
+
26
+ ### examples/input.csv
27
+
28
+ 1,,foo,foo2,foo3
29
+ 2,1
30
+ 13,2
31
+ 3,1,bar
32
+ 4,3
33
+ 5,3
34
+ 6,3
35
+ 10,9
36
+ 9,8
37
+ 8,7
38
+ 7,
39
+ 11,
40
+ 14,12
41
+ 12,
42
+ 16,15
43
+ 15,
44
+ 17,15
45
+ 18,15
46
+ 19,15
47
+ 20,15
48
+
49
+ This will form the following tree hirarchy.
50
+
51
+ 1 - 2 - 13
52
+ - 3 - 4
53
+ - 5
54
+ - 6
55
+ 7 - 8 - 9 - 10
56
+ 11
57
+ 12 - 14
58
+ 15 - 16
59
+ - 17
60
+ - 18
61
+ - 19
62
+ - 20
63
+
64
+ And here is the output.
65
+
66
+ forest examples/input.csv
67
+
68
+ Total 20: Average :4.0 Max size :7 Max height :4 Max width :5, Sandard Deviation 2.28035085019828
69
+ Top 3 trees
70
+ #1, sum 7, height 2
71
+ * 1 foo,foo2,foo3
72
+ |---+ 2
73
+ | +---> 13
74
+ +---+ 3 bar
75
+ |---> 4
76
+ |---> 5
77
+ +---> 6
78
+ Node Name: 2 Content: Parent: 1 Children: 1 Total Nodes: 2
79
+ Node Name: 3 Content: bar Parent: 1 Children: 3 Total Nodes: 4
80
+ #2, sum 6, height 1
81
+ * 15
82
+ |---> 16
83
+ |---> 17
84
+ |---> 18
85
+ |---> 19
86
+ +---> 20
87
+ Node Name: 16 Content: Parent: 15 Children: 0 Total Nodes: 1
88
+ Node Name: 17 Content: Parent: 15 Children: 0 Total Nodes: 1
89
+ Node Name: 18 Content: Parent: 15 Children: 0 Total Nodes: 1
90
+ Node Name: 19 Content: Parent: 15 Children: 0 Total Nodes: 1
91
+ Node Name: 20 Content: Parent: 15 Children: 0 Total Nodes: 1
92
+ #3, sum 4, height 3
93
+ * 7
94
+ +---+ 8
95
+ +---+ 9
96
+ +---> 10
97
+ Node Name: 8 Content: Parent: 7 Children: 1 Total Nodes: 3
98
+ #4, sum 2, height 1
99
+ * 12
100
+ +---> 14
101
+ Node Name: 14 Content: Parent: 12 Children: 0 Total Nodes: 1
102
+ #5, sum 1, height 0
103
+ * 11
104
+
105
+
106
+ ## Performance.
107
+
108
+ I tried parsing about 80000 rows. It used to take about 20 min, but now it takes 40 sec with 200mb memory space. The result may very depending on how deep your each tree is.
109
+
110
+ ## TODO (or my wishlist)
111
+
112
+ - Get rid of "Node Name" output (Annoying)
113
+ - Add more aggregate functions
114
+ - Add more filtering functions (eg: show trees which has depth of more than 3)
115
+ - Create a conversion program to draw on graphiviz diagram
116
+ - Create a conversion program to draw histogram on R
117
+ - Create a conversion program to [nested set model](http://en.wikipedia.org/wiki/Nested_set_model) or [materialized path](http://stackoverflow.com/questions/2797720/sorting-tree-with-a-materialized-path)
118
+ - Create an adapter to switch between rdbms, nosql, or file system to avoid storing everything in memory.
@@ -0,0 +1,54 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ require 'bundler'
7
+ Jeweler::Tasks.new do |gem|
8
+ gem.name = "forest"
9
+ gem.summary = %Q{A simple collection class to aggregate tree objects}
10
+ gem.description = %Q{A simple collection class to aggregate tree objects.
11
+ It takes [Adjacency List](http://sqlsummit.com/AdjacencyList.htm) as input, shows some stats and the top x biggest trees.}
12
+ gem.homepage = "http://github.com/makoto/forest"
13
+ gem.authors = ["Makoto Inoue"]
14
+ gem.add_bundler_dependencies
15
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
16
+ end
17
+ Jeweler::GemcutterTasks.new
18
+ rescue LoadError
19
+ puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
20
+ end
21
+
22
+ require 'rake/testtask'
23
+ Rake::TestTask.new(:test) do |test|
24
+ test.libs << 'lib' << 'test'
25
+ test.pattern = 'test/**/test_*.rb'
26
+ test.verbose = true
27
+ end
28
+
29
+ begin
30
+ require 'rcov/rcovtask'
31
+ Rcov::RcovTask.new do |test|
32
+ test.libs << 'test'
33
+ test.pattern = 'test/**/test_*.rb'
34
+ test.verbose = true
35
+ end
36
+ rescue LoadError
37
+ task :rcov do
38
+ abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
39
+ end
40
+ end
41
+
42
+ task :test => :check_dependencies
43
+
44
+ task :default => :test
45
+
46
+ require 'rake/rdoctask'
47
+ Rake::RDocTask.new do |rdoc|
48
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
49
+
50
+ rdoc.rdoc_dir = 'rdoc'
51
+ rdoc.title = "forest #{version}"
52
+ rdoc.rdoc_files.include('README*')
53
+ rdoc.rdoc_files.include('lib/**/*.rb')
54
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.1
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+ $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
3
+ require 'csv'
4
+ require 'forest'
5
+
6
+ data = []
7
+ file_name = ARGV[0]
8
+ CSV.open(file_name, 'r') do |row|
9
+ data << row
10
+ end
11
+
12
+ forest = Forest.new(data)
13
+ forest.print_report(100)
14
+ exit 0
@@ -0,0 +1,20 @@
1
+ 1,,foo,foo2,foo3
2
+ 2,1
3
+ 13,2
4
+ 3,1,bar
5
+ 4,3
6
+ 5,3
7
+ 6,3
8
+ 10,9
9
+ 9,8
10
+ 8,7
11
+ 7,
12
+ 11,
13
+ 14,12
14
+ 12,
15
+ 16,15
16
+ 15,
17
+ 17,15
18
+ 18,15
19
+ 19,15
20
+ 20,15
@@ -0,0 +1,58 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{forest}
8
+ s.version = "0.1.1"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Makoto Inoue"]
12
+ s.date = %q{2010-11-02}
13
+ s.default_executable = %q{forest}
14
+ s.description = %q{A simple collection class to aggregate tree objects.
15
+ It takes [Adjacency List](http://sqlsummit.com/AdjacencyList.htm) as input, shows some stats and the top x biggest trees.}
16
+ s.executables = ["forest"]
17
+ s.extra_rdoc_files = [
18
+ "LICENSE",
19
+ "README.md"
20
+ ]
21
+ s.files = [
22
+ ".document",
23
+ ".gitignore",
24
+ "Gemfile",
25
+ "Gemfile.lock",
26
+ "LICENSE",
27
+ "README.md",
28
+ "Rakefile",
29
+ "VERSION",
30
+ "bin/forest",
31
+ "examples/input.csv",
32
+ "forest.gemspec",
33
+ "lib/forest.rb",
34
+ "test/test_forest.rb"
35
+ ]
36
+ s.homepage = %q{http://github.com/makoto/forest}
37
+ s.rdoc_options = ["--charset=UTF-8"]
38
+ s.require_paths = ["lib"]
39
+ s.rubygems_version = %q{1.3.7}
40
+ s.summary = %q{A simple collection class to aggregate tree objects}
41
+ s.test_files = [
42
+ "test/test_forest.rb"
43
+ ]
44
+
45
+ if s.respond_to? :specification_version then
46
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
47
+ s.specification_version = 3
48
+
49
+ if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
50
+ s.add_runtime_dependency(%q<rubytree>, [">= 0"])
51
+ else
52
+ s.add_dependency(%q<rubytree>, [">= 0"])
53
+ end
54
+ else
55
+ s.add_dependency(%q<rubytree>, [">= 0"])
56
+ end
57
+ end
58
+
@@ -0,0 +1,163 @@
1
+ #! /usr/bin/ruby
2
+
3
+ require 'rubygems'
4
+ require "bundler/setup"
5
+ require 'tree'
6
+
7
+ class Forest
8
+ attr_accessor :trees, :orphants
9
+
10
+ def initialize(data)
11
+ grouped_tree = data.group_by{|d| [nil, "NULL", ""].include?(d[1])}
12
+ @root = grouped_tree[true]
13
+ @children = grouped_tree[false]
14
+ add_root
15
+ add_children
16
+ end
17
+
18
+ def count
19
+ @trees.size
20
+ end
21
+
22
+ def sum
23
+ @trees.values.inject(0){|sum, current| sum + current.size}
24
+ end
25
+
26
+ def sd
27
+ standard_deviation(@trees.values.map{|v| v.size })
28
+ end
29
+
30
+ def avg
31
+ sum.to_f / @trees.size
32
+ end
33
+
34
+ def biggest_node
35
+ max_node(:size)
36
+ end
37
+
38
+ def heighest_node
39
+ max_node(:node_height)
40
+ end
41
+
42
+ def widest_node
43
+ max_node(:children_size)
44
+ end
45
+
46
+ def max_sum
47
+ biggest_node.size
48
+ end
49
+
50
+ def max_height
51
+ heighest_node.node_height + 1
52
+ end
53
+
54
+ def max_width
55
+ widest_node.children_size
56
+ end
57
+
58
+ def top(n)
59
+ @trees.values.sort {|x,y| y.size <=> x.size }[0..(n - 1)]
60
+ end
61
+
62
+ def print_report(num = 3)
63
+ puts "Total #{sum}: Average :#{avg} Max size :#{max_sum} Max height :#{max_height} Max width :#{max_width}, Sandard Deviation #{sd}"
64
+ puts "Top #{num} trees"
65
+ top(num).each_with_index {|value, index|
66
+ puts "##{index + 1}, sum #{value.size}, height #{value.node_height}"
67
+ puts value.print_tree
68
+ }
69
+ end
70
+
71
+ def add_root
72
+ @trees = @root.reduce({}) do |final, current|
73
+ key = current.first.to_s
74
+ final[key] = Tree::TreeNode.new(key, get_content(current))
75
+ final
76
+ end
77
+ end
78
+
79
+ def add_children
80
+ add_recursive(@trees, @children)
81
+ end
82
+
83
+ private
84
+ def max_node(obj)
85
+ @trees.values.max{|a, b| a.send(obj) <=> b.send(obj) }
86
+ end
87
+
88
+ def get_content(array)
89
+ array[2..-1].join(",")
90
+ end
91
+
92
+ def variance(population)
93
+ n = 0
94
+ mean = 0.0
95
+ s = 0.0
96
+ population.each { |x|
97
+ n = n + 1
98
+ delta = x - mean
99
+ mean = mean + (delta / n)
100
+ s = s + delta * (x - mean)
101
+ }
102
+ # if you want to calculate std deviation
103
+ # of a sample change this to "s / (n-1)"
104
+ return s / n
105
+ end
106
+
107
+ # calculate the standard deviation of a population
108
+ # accepts: an array, the population
109
+ # returns: the standard deviation
110
+ def standard_deviation(population)
111
+ Math.sqrt(variance(population))
112
+ end
113
+
114
+ def add_recursive(ancestors, decendants, previous = 0)
115
+ remaining = []
116
+ current_generation = {}
117
+ counter = decendants.size
118
+ decendants.map do |c|
119
+ # p "#{counter}: #{c.inspect}" if counter % 10 == 0
120
+ child = c[0]
121
+ parent = c[1]
122
+ if parent_obj = ancestors[parent.to_s]
123
+ parent_obj << Tree::TreeNode.new(child.to_s, get_content(c))
124
+ current_generation[child.to_s] = parent_obj[child.to_s]
125
+ else
126
+ remaining << c
127
+ end
128
+ counter = counter - 1
129
+ end
130
+ # p "remaining.size: #{remaining.size} previous: #{previous}"
131
+ if remaining == [] || remaining.size == previous
132
+ @orphants = remaining
133
+ else
134
+ # p remaining
135
+ add_recursive(current_generation , remaining, remaining.size)
136
+ end
137
+ end
138
+ end
139
+
140
+ # Monkey patch
141
+ class Tree::TreeNode
142
+ include Enumerable
143
+
144
+ def children_size
145
+ children.size
146
+ end
147
+
148
+ def print_tree(level = 0)
149
+ if is_root?
150
+ print "*"
151
+ else
152
+ print "|" unless parent.is_last_sibling?
153
+ print(' ' * (level - 1) * 4)
154
+ print(is_last_sibling? ? "+" : "|")
155
+ print "---"
156
+ print(has_children? ? "+" : ">")
157
+ end
158
+
159
+ puts " #{name} #{content}"
160
+
161
+ children { |child| child.print_tree(level + 1)}
162
+ end
163
+ end
@@ -0,0 +1,84 @@
1
+ require "test/unit"
2
+
3
+ require "forest"
4
+
5
+ class TestForest < Test::Unit::TestCase
6
+ def setup
7
+ # 1 - 2 - 13
8
+ # - 3 - 4
9
+ # - 5
10
+ # - 6
11
+ # 7 - 8 - 9 - 10
12
+ # 11
13
+ # 12 - 14
14
+ # 15 - 16
15
+ # - 17
16
+ # - 18
17
+ # - 19
18
+ # - 20
19
+
20
+ @data = [
21
+ [1, nil, "foo", "foo2", "foo3"] ,
22
+ [2, 1] ,
23
+ [13, 2] ,
24
+ [3, 1, "bar"] ,
25
+ [4, 3] ,
26
+ [5, 3] ,
27
+ [6, 3] ,
28
+ [10, 9] ,
29
+ [9, 8] ,
30
+ [8, 7] ,
31
+ [7, nil] ,
32
+ [11, nil],
33
+ [14, 12] ,
34
+ [12, nil],
35
+ [16, 15] ,
36
+ [15, nil],
37
+ [17, 15] ,
38
+ [18, 15] ,
39
+ [19, 15] ,
40
+ [20, 15]
41
+ ]
42
+
43
+ @forest = Forest.new(@data)
44
+ end
45
+
46
+ def test_count
47
+ assert_equal(5, @forest.count)
48
+ end
49
+
50
+ def test_sum
51
+ assert_equal(20, @forest.sum)
52
+ end
53
+
54
+ def test_avg
55
+ assert_equal(4, @forest.avg)
56
+ end
57
+
58
+ def test_max_sum
59
+ assert_equal(7, @forest.max_sum)
60
+ end
61
+
62
+ def test_max_height
63
+ assert_equal(4, @forest.max_height)
64
+ end
65
+
66
+ def test_max_width
67
+ assert_equal(5, @forest.max_width)
68
+ end
69
+
70
+ def test_top
71
+ assert_equal(@forest.max_sum, @forest.top(1).first.size)
72
+ end
73
+
74
+ def test_non_matching_leaf
75
+ non_matching_leaf = [
76
+ [1, nil] ,
77
+ [3, 2] ,
78
+ ]
79
+ forest = Forest.new(non_matching_leaf)
80
+ assert_equal(1, forest.count)
81
+ assert_equal(1, forest.orphants.size)
82
+ end
83
+ end
84
+
metadata ADDED
@@ -0,0 +1,95 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: forest
3
+ version: !ruby/object:Gem::Version
4
+ hash: 25
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 1
9
+ - 1
10
+ version: 0.1.1
11
+ platform: ruby
12
+ authors:
13
+ - Makoto Inoue
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-11-02 00:00:00 +00:00
19
+ default_executable: forest
20
+ dependencies:
21
+ - !ruby/object:Gem::Dependency
22
+ requirement: &id001 !ruby/object:Gem::Requirement
23
+ none: false
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ hash: 3
28
+ segments:
29
+ - 0
30
+ version: "0"
31
+ type: :runtime
32
+ name: rubytree
33
+ prerelease: false
34
+ version_requirements: *id001
35
+ description: |-
36
+ A simple collection class to aggregate tree objects.
37
+ It takes [Adjacency List](http://sqlsummit.com/AdjacencyList.htm) as input, shows some stats and the top x biggest trees.
38
+ email:
39
+ executables:
40
+ - forest
41
+ extensions: []
42
+
43
+ extra_rdoc_files:
44
+ - LICENSE
45
+ - README.md
46
+ files:
47
+ - .document
48
+ - .gitignore
49
+ - Gemfile
50
+ - Gemfile.lock
51
+ - LICENSE
52
+ - README.md
53
+ - Rakefile
54
+ - VERSION
55
+ - bin/forest
56
+ - examples/input.csv
57
+ - forest.gemspec
58
+ - lib/forest.rb
59
+ - test/test_forest.rb
60
+ has_rdoc: true
61
+ homepage: http://github.com/makoto/forest
62
+ licenses: []
63
+
64
+ post_install_message:
65
+ rdoc_options:
66
+ - --charset=UTF-8
67
+ require_paths:
68
+ - lib
69
+ required_ruby_version: !ruby/object:Gem::Requirement
70
+ none: false
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ hash: 3
75
+ segments:
76
+ - 0
77
+ version: "0"
78
+ required_rubygems_version: !ruby/object:Gem::Requirement
79
+ none: false
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ hash: 3
84
+ segments:
85
+ - 0
86
+ version: "0"
87
+ requirements: []
88
+
89
+ rubyforge_project:
90
+ rubygems_version: 1.3.7
91
+ signing_key:
92
+ specification_version: 3
93
+ summary: A simple collection class to aggregate tree objects
94
+ test_files:
95
+ - test/test_forest.rb