goldmine 1.1.4 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c686bfb600cadea453033f457a751ce8538b00af
4
- data.tar.gz: 1216c9ddd41197c6f25c1017039b0d2428a20105
3
+ metadata.gz: eac055065f84b7550241a15c980bf74e0fc892a6
4
+ data.tar.gz: 02d1831f66469c098299f819d0e3698959bb725f
5
5
  SHA512:
6
- metadata.gz: 3baf5482a83183114999ed3d705ff8c97b6bc5b0cb9a3ae02e64f0eb9af378ea8fc4a8e39e5003f6da9eb8fae275407bb541f6eb51880e886205a72843f7532e
7
- data.tar.gz: acf22665e1acf226ef31274021da2ceb406ef3a268555abb7f597fccc3b08ba7290f81ef9789b684073455b4f9c316a0d889208375c240da57e34600c0a978fd
6
+ metadata.gz: c4a8519848f4776ac73febc4850919c5449f302471a482673c8e7e47427237759e52444759544596ceb778642ebf0ce2bd866364868cee04fe7a38dbfae443cb
7
+ data.tar.gz: e1243ab291afcafa019b6b287f98a20b5054f080fc31764322cffe07b604c59d0b23893da822c27401c8727c3b9f64fabe65d9e1acb167f57a5ed82b814abfec
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- goldmine (1.1.4)
4
+ goldmine (1.2.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -21,6 +21,7 @@ Think of __Goldmine__ as `Enumerable#group_by` on steroids.
21
21
  ---
22
22
 
23
23
  The [demo project](http://hopsoft.github.io/goldmine/) demonstrates some of Goldmine's uses.
24
+ `TODO: update the demo project`
24
25
 
25
26
  ---
26
27
 
@@ -109,7 +110,7 @@ list.pivot { |record| record[:favorite_colors] }
109
110
  }
110
111
  ```
111
112
 
112
- # Stacked pivots
113
+ ## Stacked pivots
113
114
 
114
115
  ```ruby
115
116
  list = [
@@ -142,31 +143,72 @@ end
142
143
  }
143
144
  ```
144
145
 
145
- # Returning pivots in tabular format
146
+ ## Rollups
146
147
 
147
- This feature is useful when you need to do things like export to CSV or build user interfaces.
148
+ Sometimes it's useful to roll pivots into a summary.
148
149
 
149
150
  ```ruby
150
- # using the stacked pivot example above
151
- mined.to_a
151
+ list = [1,2,3,4,5,6,7,8,9]
152
+ list = Goldmine::ArrayMiner.new(list)
153
+ pivoted = list.pivot(:less_than_5) { |i| i < 5 }.pivot(:even) { |i| i % 2 == 0 }
154
+ pivoted.rollup { |values| values.size }
155
+ # result:
156
+ {
157
+ { :less_than_5 => true, :even => false} => 2,
158
+ { :less_than_5 => true, :even => true} => 2,
159
+ { :less_than_5 => false, :even => false} => 3,
160
+ { :less_than_5 => false, :even => true} => 2
161
+ }
162
+ ```
163
+
164
+ ## Tabular data
165
+
166
+ Tabular data provides a more streamlined summary view of a pivot.
167
+
168
+ ```ruby
169
+ list = [1,2,3,4,5,6,7,8,9]
170
+ list = Goldmine::ArrayMiner.new(list)
171
+ pivoted = list.pivot(:less_than_5) { |i| i < 5 }.pivot(:even) { |i| i % 2 == 0 }
172
+ pivoted.to_tabular
152
173
  # result:
153
174
  [
154
- ["Name has an 'e'", ">= 21 years old", "Percent of Total", "Count"],
155
- [false, true, 0.4, 2],
156
- [true, true, 0.4, 2],
157
- [true, false, 0.2, 1]
175
+ ["less_than_5", "even", "percent", "count"],
176
+ [true, false, 0.22, 2],
177
+ [true, true, 0.22, 2],
178
+ [false, false, 0.33, 3],
179
+ [false, true, 0.22, 2]
158
180
  ]
159
181
  ```
160
182
 
161
- The first entry is the header row.
162
- Subsequent entries are data rows.
163
- The last value in each data row indicates the number of matches.
183
+ ## CSV table
164
184
 
165
- Need to sort the rows? Just pass a `sort_by` block.
185
+ CSV tables are a formalized version of tabular data.
186
+ They simplify the complexity of working with tabular data.
166
187
 
167
188
  ```ruby
168
- # sort on "total" i.e. 3rd value in the row
169
- mined.to_a do |row|
170
- row[2]
189
+ list = [1,2,3,4,5,6,7,8,9]
190
+ list = Goldmine::ArrayMiner.new(list)
191
+ pivoted = list.pivot(:less_than_5) { |i| i < 5 }.pivot(:even) { |i| i % 2 == 0 }
192
+ csv = pivoted.to_csv
193
+
194
+ csv.headers # => ["less_than_5", "even", "percent", "count"]
195
+
196
+ csv.each do |row|
197
+ puts row["less_than_5"]
198
+ puts row["even"]
171
199
  end
200
+
201
+ csv.to_csv
202
+ # result:
203
+ "less_than_5,even,percent,count\ntrue,false,0.22,2\ntrue,true,0.22,2\nfalse,false,0.33,3\nfalse,true,0.22,2\n"
172
204
  ```
205
+
206
+ ## Summary
207
+
208
+ Goldmine allows you to combine the power of pivots, rollups, tabular data,
209
+ & csv to construct deep insights into your data with minimal effort.
210
+
211
+ One of our common use cases is to query a database using ActiveRecord,
212
+ pivot the results, convert to csv, sort, pivot again,
213
+ then rollup the results to create data visualizations in the form of charts & graphs.
214
+
@@ -3,9 +3,11 @@ require "array_miner"
3
3
  require "hash_miner"
4
4
 
5
5
  module Goldmine
6
- def self.miner(object)
7
- return ArrayMiner.new(object) if object.is_a?(Array)
8
- return HashMiner.new(object) if object.is_a?(Hash)
9
- nil
6
+ class << self
7
+ def miner(object)
8
+ return ArrayMiner.new(object) if object.is_a?(Array)
9
+ return HashMiner.new(object) if object.is_a?(Hash)
10
+ nil
11
+ end
10
12
  end
11
13
  end
@@ -4,8 +4,9 @@ module Goldmine
4
4
  class ArrayMiner < SimpleDelegator
5
5
  attr_reader :source_data
6
6
 
7
- def initialize(array=[])
8
- super @source_data = array
7
+ def initialize(array=[], source_data: [])
8
+ @source_data = source_data
9
+ super array
9
10
  end
10
11
 
11
12
  # Pivots the Array into a Hash of mined data.
@@ -47,7 +48,7 @@ module Goldmine
47
48
  # @yield [Object] Yields once for each item in the Array
48
49
  # @return [Hash] The pivoted Hash of data.
49
50
  def pivot(name=nil, &block)
50
- reduce(HashMiner.new(source_data: source_data)) do |memo, item|
51
+ reduce(HashMiner.new(source_data: self)) do |memo, item|
51
52
  value = yield(item)
52
53
 
53
54
  if value.is_a?(Array)
@@ -1,11 +1,12 @@
1
1
  require "delegate"
2
+ require "csv"
2
3
 
3
4
  module Goldmine
4
5
  class HashMiner < SimpleDelegator
5
6
  attr_reader :source_data
6
7
 
7
- def initialize(hash={}, source_data: nil)
8
- @source_data = source_data || hash
8
+ def initialize(hash={}, source_data: [])
9
+ @source_data = source_data
9
10
  super hash
10
11
  end
11
12
 
@@ -28,8 +29,8 @@ module Goldmine
28
29
  #
29
30
  # @note This method should not be called directly. Call Array#pivot instead.
30
31
  #
31
- # @param [String] name The named of the pivot.
32
- # @yield [Object] Yields once for each item in the Array
32
+ # @param name [String] The named of the pivot.
33
+ # @yield [Object] Yields once for each item in the Array.
33
34
  # @return [Hash] The pivoted Hash of data.
34
35
  def pivot(name=nil, &block)
35
36
  return self unless goldmine
@@ -51,10 +52,51 @@ module Goldmine
51
52
  end
52
53
  end
53
54
 
55
+ # Returns a new "rolled up" Hash based on the return value of the yield.
56
+ #
57
+ # @yield [Object] Yields once for each pivoted group.
58
+ # @return [Hash] The rollup Hash of data.
59
+ def rollup
60
+ each_with_object({}) do |pair, memo|
61
+ memo[pair.first] = yield(pair.last)
62
+ end
63
+ end
64
+
65
+ # Returns a tabular representation of the pivot.
66
+ # Useful for building CSVs & data visualizations.
67
+ #
68
+ # @param percent_column_name [String] The name of the percent column (percent of total)
69
+ # @param count_column_name [String] The name of the count column (number of objects)
70
+ # @return [Array] The tabular representation of the data.
71
+ def to_tabular(percent_column_name: "percent", count_column_name: "count")
72
+ [].tap do |rows|
73
+ rows << tabular_header_from_key(first.first) + [percent_column_name, count_column_name]
74
+ rolled = rollup { |row| row.size }
75
+ rolled.each do |key, value|
76
+ tabular_row_from_key(key).tap do |row|
77
+ rows << row + [calculate_percentage(value, source_data.size), value]
78
+ end
79
+ end
80
+ end
81
+ end
82
+
83
+ # Returns an in memory CSV table representation of the pivot.
84
+ # Useful for working with data & building data visualizations.
85
+ #
86
+ # @param percent_column_name [String] The name of the percent column (percent of total)
87
+ # @param count_column_name [String] The name of the count column (number of objects)
88
+ # @return [CSV::Table] The CSV representation of the data.
89
+ def to_csv(percent_column_name: "percent", count_column_name: "count")
90
+ tabular = to_tabular(percent_column_name: percent_column_name, count_column_name: count_column_name)
91
+ header = tabular.shift
92
+ rows = tabular.map { |row| CSV::Row.new(header, row) }
93
+ CSV::Table.new rows
94
+ end
95
+
54
96
  # Assigns a key/value pair to the Hash.
55
- # @param [String] name The name of a pivot (can be null).
56
- # @param [Object] key The key to use.
57
- # @param [Object] value The value to assign
97
+ # @param name [String] The name of a pivot (can be null).
98
+ # @param key [Object] The key to use.
99
+ # @param value [Object] The value to assign
58
100
  # @return [Object] The result of the assignment.
59
101
  def assign_mined(name, key, value)
60
102
  goldmine_key = goldmine_key(name, key)
@@ -63,35 +105,31 @@ module Goldmine
63
105
  end
64
106
 
65
107
  # Creates a key for a pivot-name/key combo.
66
- # @param [String] name The name of a pivot (can be null).
67
- # @param [Object] key The key to use.
108
+ # @param name [String] The name of a pivot (can be null).
109
+ # @param key [Object] The key to use.
68
110
  # @return [Object] The constructed key.
69
111
  def goldmine_key(name, key)
70
112
  goldmine_key = { name => key } if name
71
113
  goldmine_key ||= key
72
114
  end
73
115
 
74
- # Returns the pivot keys.
75
- # @return [Array]
76
- def pivoted_keys
77
- first.first.keys
116
+ private
117
+
118
+ def calculate_percentage(count, total)
119
+ return 0.0 unless total > 0
120
+ sprintf("%.2f", count / total.to_f).to_f
78
121
  end
79
122
 
80
- # Returns pivoted data as a tabular Array that can be used to build CSVs or user interfaces.
81
- # @return [Array] Tabular pivot data
82
- # @yield [Array] sort_by block for sorting the Array
83
- def to_a(&block)
84
- rows = map do |pair|
85
- [].tap do |row|
86
- row.concat pair.first.values
87
- row << sprintf("%.2f", (pair.last.size / source_data.size.to_f)).to_f
88
- row << pair.last.size
89
- end
90
- end
91
- rows = rows.sort_by(&block) if block_given?
92
- header = [pivoted_keys.map(&:to_s), "Percent of Total", "Count"].flatten
93
- rows.insert 0, header
94
- rows
123
+ def tabular_header_from_key(key)
124
+ return key.keys.map(&:to_s) if key.is_a?(Hash)
125
+ key = [key] unless key.is_a?(Array)
126
+ (0..key.size-1).map { |i| "column#{i}" }
127
+ end
128
+
129
+ def tabular_row_from_key(key)
130
+ return key.dup if key.is_a?(Array)
131
+ return [key] unless key.is_a?(Hash)
132
+ key.values.dup
95
133
  end
96
134
 
97
135
  end
@@ -1,3 +1,3 @@
1
1
  module Goldmine
2
- VERSION = "1.1.4"
2
+ VERSION = "1.2.0"
3
3
  end
@@ -30,6 +30,44 @@ class TestGoldmine < PryTest::Test
30
30
  assert data == expected
31
31
  end
32
32
 
33
+ test "simple pivot rollup" do
34
+ list = [1,2,3,4,5,6,7,8,9]
35
+ list = Goldmine::ArrayMiner.new(list)
36
+ data = list.pivot { |i| i < 5 }
37
+ rolled = data.rollup { |row| row.size }
38
+
39
+ expected = {
40
+ true => 4,
41
+ false => 5
42
+ }
43
+
44
+ assert rolled == expected
45
+ end
46
+
47
+ test "simple pivot to_tabular" do
48
+ list = [1,2,3,4,5,6,7,8,9]
49
+ list = Goldmine::ArrayMiner.new(list)
50
+ data = list.pivot { |i| i < 5 }
51
+
52
+ expected = [
53
+ ["column0", "percent", "count"],
54
+ [true, 0.44, 4],
55
+ [false, 0.56, 5]
56
+ ]
57
+
58
+ assert data.to_tabular == expected
59
+ end
60
+
61
+ test "simple pivot to_csv" do
62
+ list = [1,2,3,4,5,6,7,8,9]
63
+ list = Goldmine::ArrayMiner.new(list)
64
+ data = list.pivot { |i| i < 5 }
65
+ csv = data.to_csv
66
+
67
+ assert csv.headers == ["column0", "percent", "count"]
68
+ assert csv.to_a == [["column0", "percent", "count"], [true, 0.44, 4], [false, 0.56, 5]]
69
+ end
70
+
33
71
  test "named pivot" do
34
72
  list = [1,2,3,4,5,6,7,8,9]
35
73
  list = Goldmine::ArrayMiner.new(list)
@@ -43,56 +81,32 @@ class TestGoldmine < PryTest::Test
43
81
  assert data == expected
44
82
  end
45
83
 
46
- test "pivoted_keys" do
84
+ test "named pivot rollup" do
47
85
  list = [1,2,3,4,5,6,7,8,9]
48
86
  list = Goldmine::ArrayMiner.new(list)
49
87
  data = list.pivot("less than 5") { |i| i < 5 }
50
- expected = ["less than 5"]
51
- assert data.pivoted_keys == expected
52
- end
53
-
54
- test "to_a tabular data" do
55
- list = [
56
- { :name => "Sally", :age => 21 },
57
- { :name => "John", :age => 28 },
58
- { :name => "Stephen", :age => 37 },
59
- { :name => "Emily", :age => 32 },
60
- { :name => "Joe", :age => 18 }
61
- ]
62
- list = Goldmine::ArrayMiner.new(list)
63
- mined = list.pivot("Name has an 'e'") do |record|
64
- !!record[:name].match(/e/i)
65
- end
66
- mined = mined.pivot(">= 21 years old") do |record|
67
- record[:age] >= 21
68
- end
88
+ rolled = data.rollup { |row| row.size }
69
89
 
70
- expected = [["Name has an 'e'", ">= 21 years old", "Percent of Total", "Count"], [true, false, 0.2, 1], [false, true, 0.4, 2], [true, true, 0.4, 2]]
71
-
72
- # block is sort_by
73
- tabular_data = mined.to_a do |row|
74
- [row[2], row[0] ? 1 : 0, row[1] ? 1 : 0]
75
- end
90
+ expected = {
91
+ { "less than 5" => true } => 4,
92
+ { "less than 5" => false } => 5
93
+ }
76
94
 
77
- assert tabular_data == expected
95
+ assert rolled == expected
78
96
  end
79
97
 
80
- test "source_data" do
81
- list = [
82
- { :name => "Sally", :age => 21 },
83
- { :name => "John", :age => 28 },
84
- { :name => "Stephen", :age => 37 },
85
- { :name => "Emily", :age => 32 },
86
- { :name => "Joe", :age => 18 }
87
- ]
98
+ test "named pivot to_tabular" do
99
+ list = [1,2,3,4,5,6,7,8,9]
88
100
  list = Goldmine::ArrayMiner.new(list)
89
- mined = list.pivot("Name has an 'e'") do |record|
90
- !!record[:name].match(/e/i)
91
- end
92
- mined = mined.pivot(">= 21 years old") do |record|
93
- record[:age] >= 21
94
- end
95
- assert mined.source_data == list
101
+ data = list.pivot("less than 5") { |i| i < 5 }
102
+
103
+ expected = [
104
+ ["less than 5", "percent", "count"],
105
+ [true, 0.44, 4],
106
+ [false, 0.56, 5]
107
+ ]
108
+
109
+ assert data.to_tabular == expected
96
110
  end
97
111
 
98
112
  test "pivot of list values" do
@@ -164,6 +178,38 @@ class TestGoldmine < PryTest::Test
164
178
  assert data == expected
165
179
  end
166
180
 
181
+ test "chained pivots rollup" do
182
+ list = [1,2,3,4,5,6,7,8,9]
183
+ list = Goldmine::ArrayMiner.new(list)
184
+ data = list.pivot { |i| i < 5 }.pivot { |i| i % 2 == 0 }
185
+ rolled = data.rollup { |row| row.size }
186
+
187
+ expected = {
188
+ [true, false] => 2,
189
+ [true, true] => 2,
190
+ [false, false] => 3,
191
+ [false, true] => 2
192
+ }
193
+
194
+ assert rolled == expected
195
+ end
196
+
197
+ test "chained pivots to_tabular" do
198
+ list = [1,2,3,4,5,6,7,8,9]
199
+ list = Goldmine::ArrayMiner.new(list)
200
+ data = list.pivot { |i| i < 5 }.pivot { |i| i % 2 == 0 }
201
+
202
+ expected = [
203
+ ["column0", "column1", "percent", "count"],
204
+ [true, false, 0.22, 2],
205
+ [true, true, 0.22, 2],
206
+ [false, false, 0.33, 3],
207
+ [false, true, 0.22, 2]
208
+ ]
209
+
210
+ assert data.to_tabular == expected
211
+ end
212
+
167
213
  test "deep chained pivots" do
168
214
  list = [1,2,3,4,5,6,7,8,9]
169
215
  list = Goldmine::ArrayMiner.new(list)
@@ -207,7 +253,6 @@ class TestGoldmine < PryTest::Test
207
253
  }
208
254
 
209
255
  assert data == expected
210
- assert data.source_data == list
211
256
  end
212
257
 
213
258
  test "named chained pivots" do
@@ -225,4 +270,53 @@ class TestGoldmine < PryTest::Test
225
270
  assert data == expected
226
271
  end
227
272
 
273
+ test "named chained pivots rollup" do
274
+ list = [1,2,3,4,5,6,7,8,9]
275
+ list = Goldmine::ArrayMiner.new(list)
276
+ data = list.pivot("less than 5") { |i| i < 5 }.pivot("divisible by 2") { |i| i % 2 == 0 }
277
+ rolled = data.rollup { |row| row.size }
278
+
279
+ expected = {
280
+ { "less than 5" => true, "divisible by 2" => false } => 2,
281
+ { "less than 5" => true, "divisible by 2" => true } => 2,
282
+ { "less than 5" => false, "divisible by 2" => false } => 3,
283
+ { "less than 5" => false, "divisible by 2" => true } => 2
284
+ }
285
+
286
+ assert rolled == expected
287
+ end
288
+
289
+ test "named chained pivots to tabular" do
290
+ list = [1,2,3,4,5,6,7,8,9]
291
+ list = Goldmine::ArrayMiner.new(list)
292
+ data = list.pivot("less than 5") { |i| i < 5 }.pivot("divisible by 2") { |i| i % 2 == 0 }
293
+
294
+ expected = [
295
+ ["less than 5", "divisible by 2", "percent", "count"],
296
+ [true, false, 0.22, 2],
297
+ [true, true, 0.22, 2],
298
+ [false, false, 0.33, 3],
299
+ [false, true, 0.22, 2]
300
+ ]
301
+
302
+ assert data.to_tabular == expected
303
+ end
304
+
305
+ test "named chained pivots to csv" do
306
+ list = [1,2,3,4,5,6,7,8,9]
307
+ list = Goldmine::ArrayMiner.new(list)
308
+ data = list.pivot("less than 5") { |i| i < 5 }.pivot("divisible by 2") { |i| i % 2 == 0 }
309
+ csv = data.to_csv
310
+
311
+ assert csv.to_a == data.to_tabular
312
+
313
+ expected = ["less than 5", "divisible by 2", "percent", "count"]
314
+ assert csv.headers == expected
315
+
316
+ row = csv.first
317
+ assert row["less than 5"] == true
318
+ assert row["divisible by 2"] == false
319
+ assert row["percent"] == 0.22
320
+ assert row ["count"] == 2
321
+ end
228
322
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: goldmine
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.4
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nathan Hopkins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-05-28 00:00:00.000000000 Z
11
+ date: 2015-06-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake