activewarehouse-etl 0.9.0 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG CHANGED
@@ -165,7 +165,7 @@
165
165
  0.8.4 - May 24, 2007
166
166
  * Added fix for backslash in file writer
167
167
 
168
- 0.9.0 -
168
+ 0.9.0 - August 9, 2007
169
169
  * Added support for batch processing through .ebf files. These files are
170
170
  essentially control files that apply settings to an entire ETL process.
171
171
  * Implemented support for screen blocks. These blocks can be used to test
@@ -175,4 +175,24 @@
175
175
  connection information in the control files.
176
176
  * Implemented temp table support throughout.
177
177
  * DateDimensionBuilder now included in ActiveWarehouse ETL directly.
178
- * Time calculations for fiscal year now included in ActiveWarehouse ETL.
178
+ * Time calculations for fiscal year now included in ActiveWarehouse ETL.
179
+
180
+ 0.9.1 -
181
+ * SQLResolver now uses ETL::Engine.table so it may utilize temp tables. (aeden)
182
+ * Added Thibaut Barrère's encode processor.
183
+ * Added MockSource and MockDestination test helpers (thbar)
184
+ * Added the block processor. Can call a block once (pre/post processor)
185
+ or once for each row (after_read/before_write row processor) (thbar)
186
+ * Changed temp table to use new AdapterExtension copy_table method (aeden)
187
+ * Added bin/etl.cmd windows batch - just add the bin folder to your PATH
188
+ and it will let you call etl on an unpacked/pistoned version of AW-ETL (thbar)
189
+ * Upgraded to support Rails 2.1. No longer compatible with older versions of Rails.
190
+ * Added ETL::Builder::TimeDimensionBuilder
191
+ * Added :default option to ForeignKeyLookupTransform that will be used if no
192
+ foreign key is found.
193
+ * Added :cache option to ForeignKeyLookupTransform that will preload the FK
194
+ mappings if the underlying resolver supports it. Currently supported by
195
+ SQLResolver.
196
+ * A Class extending ETL::Transform::Transform may now be passed as a transformer.
197
+ For example, in the control file you would define the transform as:
198
+ transform :a_field, MyTransform, {:option1 => 'option1'}.
data/README CHANGED
@@ -1,5 +1,14 @@
1
1
  Ruby Extract-Transform-Load (ETL) tool.
2
2
 
3
+ == Requirements
4
+
5
+ * Ruby 1.8.5 or higher
6
+ * Rubygems
7
+
8
+ == Online Documentation
9
+
10
+ Available at http://activewarehouse.rubyforge.org/docs/activewarehouse-etl.html
11
+
3
12
  == Features
4
13
 
5
14
  Current supported features:
@@ -67,6 +76,9 @@ Command line options:
67
76
  == Control File Examples
68
77
  Control file examples can be found in the examples directory.
69
78
 
79
+ == Running Tests
80
+ The tests require Shoulda 1.x.
81
+
70
82
  == Feedback
71
83
  This is a work in progress. Comments should be made on the
72
84
  activewarehouse-discuss mailing list at the moment. Contributions are always
data/Rakefile CHANGED
@@ -7,16 +7,13 @@ require 'rake/contrib/rubyforgepublisher'
7
7
 
8
8
  require File.join(File.dirname(__FILE__), 'lib/etl', 'version')
9
9
 
10
- PKG_BUILD = ENV['PKG_BUILD'] ? '.' + ENV['PKG_BUILD'] : ''
11
- PKG_NAME = 'activewarehouse-etl'
12
- PKG_VERSION = ETL::VERSION::STRING + PKG_BUILD
13
- PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
14
- PKG_DESTINATION = ENV["PKG_DESTINATION"] || "../#{PKG_NAME}"
15
-
16
- RELEASE_NAME = "REL #{PKG_VERSION}"
17
-
18
- RUBY_FORGE_PROJECT = "activewarehouse"
19
- RUBY_FORGE_USER = "aeden"
10
+ module AWETL
11
+ PKG_BUILD = ENV['PKG_BUILD'] ? '.' + ENV['PKG_BUILD'] : ''
12
+ PKG_NAME = 'activewarehouse-etl'
13
+ PKG_VERSION = ETL::VERSION::STRING + PKG_BUILD
14
+ PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
15
+ PKG_DESTINATION = ENV["PKG_DESTINATION"] || "../#{PKG_NAME}"
16
+ end
20
17
 
21
18
  desc 'Default: run unit tests.'
22
19
  task :default => :test
@@ -45,54 +42,62 @@ namespace :rcov do
45
42
  mkdir 'coverage' unless File.exist?('coverage')
46
43
  rcov = "rcov --aggregate coverage.data --text-summary -Ilib"
47
44
  system("#{rcov} test/*_test.rb")
48
- system("open coverage/index.html") if PLATFORM['darwin']
45
+ # system("open coverage/index.html") if PLATFORM['darwin']
49
46
  end
50
47
  end
51
48
 
52
- PKG_FILES = FileList[
53
- 'CHANGELOG',
54
- 'LICENSE',
55
- 'README',
56
- 'TODO',
57
- 'Rakefile',
58
- 'bin/**/*',
59
- 'doc/**/*',
60
- 'lib/**/*',
61
- 'examples/**/*',
62
- ] - [ 'test' ]
63
-
64
- spec = Gem::Specification.new do |s|
65
- s.name = 'activewarehouse-etl'
66
- s.version = PKG_VERSION
67
- s.summary = "Pure Ruby ETL package."
68
- s.description = <<-EOF
69
- ActiveWarehouse ETL is a pure Ruby Extract-Transform-Load application for loading data into a database.
70
- EOF
71
-
72
- s.add_dependency('rake', '>= 0.7.1')
73
- s.add_dependency('activesupport', '>= 1.3.1')
74
- s.add_dependency('activerecord', '>= 1.14.4')
75
- s.add_dependency('fastercsv', '>= 1.2.0')
76
- s.add_dependency('adapter_extensions', '>= 0.1.0')
77
-
78
- s.rdoc_options << '--exclude' << '.'
79
- s.has_rdoc = false
80
-
81
- s.files = PKG_FILES.to_a.delete_if {|f| f.include?('.svn')}
82
- s.require_path = 'lib'
83
-
84
- s.bindir = "bin" # Use these for applications.
85
- s.executables = ['etl']
86
- s.default_executable = "etl"
87
-
88
- s.author = "Anthony Eden"
89
- s.email = "anthonyeden@gmail.com"
90
- s.homepage = "http://activewarehouse.rubyforge.org/etl"
91
- s.rubyforge_project = "activewarehouse"
49
+ # Gem Spec
50
+
51
+ module AWETL
52
+ def self.package_files(package_prefix)
53
+ FileList[
54
+ "#{package_prefix}CHANGELOG",
55
+ "#{package_prefix}LICENSE",
56
+ "#{package_prefix}README",
57
+ "#{package_prefix}TODO",
58
+ "#{package_prefix}Rakefile",
59
+ "#{package_prefix}bin/**/*",
60
+ "#{package_prefix}doc/**/*",
61
+ "#{package_prefix}lib/**/*",
62
+ "#{package_prefix}examples/**/*",
63
+ ] - [ "#{package_prefix}test" ]
64
+ end
65
+
66
+ def self.spec(package_prefix = '')
67
+ Gem::Specification.new do |s|
68
+ s.name = 'activewarehouse-etl'
69
+ s.version = AWETL::PKG_VERSION
70
+ s.summary = "Pure Ruby ETL package."
71
+ s.description = <<-EOF
72
+ ActiveWarehouse ETL is a pure Ruby Extract-Transform-Load application for loading data into a database.
73
+ EOF
74
+
75
+ s.add_dependency('rake', '>= 0.7.1')
76
+ s.add_dependency('activesupport', '>= 1.3.1')
77
+ s.add_dependency('activerecord', '>= 1.14.4')
78
+ s.add_dependency('fastercsv', '>= 1.2.0')
79
+ s.add_dependency('adapter_extensions', '>= 0.1.0')
80
+
81
+ s.rdoc_options << '--exclude' << '.'
82
+ s.has_rdoc = false
83
+
84
+ s.files = package_files(package_prefix).to_a.delete_if {|f| f.include?('.svn')}
85
+ s.require_path = 'lib'
86
+
87
+ s.bindir = "#{package_prefix}bin" # Use these for applications.
88
+ s.executables = ['etl']
89
+ s.default_executable = "etl"
90
+
91
+ s.author = "Anthony Eden"
92
+ s.email = "anthonyeden@gmail.com"
93
+ s.homepage = "http://activewarehouse.rubyforge.org/etl"
94
+ s.rubyforge_project = "activewarehouse"
95
+ end
96
+ end
92
97
  end
93
98
 
94
- Rake::GemPackageTask.new(spec) do |pkg|
95
- pkg.gem_spec = spec
99
+ Rake::GemPackageTask.new(AWETL.spec) do |pkg|
100
+ pkg.gem_spec = AWETL.spec
96
101
  pkg.need_tar = true
97
102
  pkg.need_zip = true
98
103
  end
@@ -112,10 +117,10 @@ task :lines do
112
117
  codelines += 1
113
118
  end
114
119
  puts "L: #{sprintf("%4d", lines)}, LOC #{sprintf("%4d", codelines)} | #{file_name}"
115
-
120
+
116
121
  total_lines += lines
117
122
  total_codelines += codelines
118
-
123
+
119
124
  lines, codelines = 0, 0
120
125
  end
121
126
 
@@ -127,7 +132,7 @@ task :release => [ :package ] do
127
132
  `rubyforge login`
128
133
 
129
134
  for ext in %w( gem tgz zip )
130
- release_command = "rubyforge add_release activewarehouse #{PKG_NAME} 'REL #{PKG_VERSION}' pkg/#{PKG_NAME}-#{PKG_VERSION}.#{ext}"
135
+ release_command = "rubyforge add_release activewarehouse #{AWETL::PKG_NAME} 'REL #{AWETL::PKG_VERSION}' pkg/#{AWETL::PKG_NAME}-#{AWETL::PKG_VERSION}.#{ext}"
131
136
  puts release_command
132
137
  system(release_command)
133
138
  end
@@ -143,6 +148,6 @@ task :reinstall => [:package] do
143
148
  windows = RUBY_PLATFORM =~ /mswin/
144
149
  sudo = windows ? '' : 'sudo'
145
150
  gem = windows ? 'gem.bat' : 'gem'
146
- `#{sudo} #{gem} uninstall -x -i #{PKG_NAME}`
147
- `#{sudo} #{gem} install pkg/#{PKG_NAME}-#{PKG_VERSION}`
148
- end
151
+ `#{sudo} #{gem} uninstall #{AWETL::PKG_NAME} -x`
152
+ `#{sudo} #{gem} install pkg/#{AWETL::PKG_NAME}-#{AWETL::PKG_VERSION}`
153
+ end
data/bin/etl CHANGED
File without changes
@@ -0,0 +1,8 @@
1
+ @echo off
2
+
3
+ rem The purpose of this Windows script is to let you use the etl command line with a non-gem version of AW-ETL (eg: unpacked gem, pistoned trunk).
4
+ rem Just add the current folder on top of your PATH variable to use it instead of the etl command provided with the gem release.
5
+
6
+ rem %~dp0 returns the absolute path where the current script is. We just append 'etl' to it, and forward all the arguments with %*
7
+
8
+ ruby "%~dp0etl" %*
@@ -3,4 +3,14 @@ etl_execution:
3
3
  username: root
4
4
  host: localhost
5
5
  database: etl_execution
6
- encoding: utf8
6
+ encoding: utf8
7
+ datawarehouse:
8
+ adapter: mysql
9
+ username: root
10
+ host: localhost
11
+ database: datawarehouse_development
12
+ operational:
13
+ adapter: mysql
14
+ username: root
15
+ host: localhost
16
+ database: operational_production
data/lib/etl.rb CHANGED
@@ -31,29 +31,15 @@ require 'erb'
31
31
 
32
32
  require 'rubygems'
33
33
 
34
- unless Kernel.respond_to?(:gem)
35
- Kernel.send :alias_method, :gem, :require_gem
34
+ unless defined?(REXML::VERSION)
35
+ require 'rexml/rexml'
36
+ REXML::VERSION = REXML::Version
36
37
  end
37
38
 
38
- unless defined?(ActiveSupport)
39
- gem 'activesupport'
40
- require 'active_support'
41
- end
42
-
43
- unless defined?(ActiveRecord)
44
- gem 'activerecord'
45
- require 'active_record'
46
- end
47
-
48
- unless defined?(AdapterExtensions)
49
- gem 'adapter_extensions'
50
- require 'adapter_extensions'
51
- end
52
-
53
- unless defined?(FasterCSV)
54
- gem 'fastercsv'
55
- require 'faster_csv'
56
- end
39
+ require 'active_support'
40
+ require 'active_record'
41
+ require 'adapter_extensions'
42
+ require 'faster_csv'
57
43
 
58
44
  $:.unshift(File.dirname(__FILE__))
59
45
 
@@ -79,6 +65,8 @@ module ETL #:nodoc:
79
65
  end
80
66
  class DefinitionError < ControlError #:nodoc:
81
67
  end
68
+ class ConfigurationError < ControlError #:nodoc:
69
+ end
82
70
  class MismatchError < ETLError #:nodoc:
83
71
  end
84
72
  class ResolverError < ETLError #:nodoc:
@@ -1 +1,2 @@
1
- require 'etl/builder/date_dimension_builder'
1
+ require 'etl/builder/date_dimension_builder'
2
+ require 'etl/builder/time_dimension_builder'
@@ -11,6 +11,9 @@ module ETL #:nodoc:
11
11
 
12
12
  # Define any holiday indicators
13
13
  attr_accessor :holiday_indicators
14
+
15
+ # Add offset month for fiscal year
16
+ attr_accessor :fiscal_year_offset_month
14
17
 
15
18
  # Define the weekday indicators. The default array begins on Sunday and goes to Saturday.
16
19
  cattr_accessor :weekday_indicators
@@ -20,64 +23,74 @@ module ETL #:nodoc:
20
23
  #
21
24
  # * <tt>start_date</tt>: The start date. Defaults to 5 years ago from today.
22
25
  # * <tt>end_date</tt>: The end date. Defaults to now.
23
- def initialize(start_date=Time.now.years_ago(5), end_date=Time.now)
24
- @start_date = start_date
25
- @end_date = end_date
26
+ def initialize(start_date=Time.now.years_ago(5), end_date=Time.now, fiscal_year_offset_month=10)
27
+ @start_date = start_date.to_date
28
+ @end_date = end_date.to_date
29
+ @fiscal_year_offset_month = fiscal_year_offset_month.to_i
26
30
  @holiday_indicators = []
27
31
  end
28
32
 
29
- # Returns an array of hashes representing records in the dimension. The values for each record are
30
- # accessed by name.
33
+ # Returns an array of hashes representing records in the dimension.
31
34
  def build(options={})
32
- records = []
33
- date = start_date.to_time
34
- while date <= end_date.to_time
35
- record = {}
36
- record[:date] = date.strftime("%m/%d/%Y")
37
- record[:full_date_description] = date.strftime("%B %d,%Y")
38
- record[:day_of_week] = date.strftime("%A")
39
- #record[:day_number_in_epoch] = date.to_i / 24
40
- #record[:week_number_in_epoch] = date.to_i / (24 * 7)
41
- #record[:month_number_in_epoch] = date.to_i / (24 * 7 * 30)
42
- record[:day_number_in_calendar_month] = date.day
43
- record[:day_number_in_calendar_year] = date.yday
44
- record[:day_number_in_fiscal_month] = date.day # should this be different from CY?
45
- record[:day_number_in_fiscal_year] = date.fiscal_year_yday
46
- #record[:last_day_in_week_indicator] =
47
- #record[:last_day_in_month_indicator] =
48
- #record[:calendar_week_ending_date] =
49
- record[:calendar_week] = "Week #{date.week}"
50
- record[:calendar_week_number_in_year] = date.week
51
- record[:calendar_month_name] = date.strftime("%B")
52
- record[:calendar_month_number_in_year] = date.month
53
- record[:calendar_year_month] = date.strftime("%Y-%m")
54
- record[:calendar_quarter] = "Q#{date.quarter}"
55
- record[:calendar_quarter_number_in_year] = date.quarter
56
- record[:calendar_year_quarter] = "#{date.strftime('%Y')}-#{record[:calendar_quarter]}"
57
- #record[:calendar_half_year] =
58
- record[:calendar_year] = "#{date.year}"
59
- record[:fiscal_week] = "FY Week #{date.fiscal_year_week}"
60
- record[:fiscal_week_number_in_year] = date.fiscal_year_week
61
- record[:fiscal_month] = date.fiscal_year_month
62
- record[:fiscal_month_number_in_year] = date.fiscal_year_month
63
- record[:fiscal_year_month] = "FY#{date.fiscal_year}-" + date.fiscal_year_month.to_s.rjust(2, '0')
64
- record[:fiscal_quarter] = "FY Q#{date.fiscal_year_quarter}"
65
- record[:fiscal_year_quarter] = "FY#{date.fiscal_year}-Q#{date.fiscal_year_quarter}"
66
- record[:fiscal_year_quarter_number] = date.fiscal_year_quarter
67
- #record[:fiscal_half_year] =
68
- record[:fiscal_year] = "FY#{date.fiscal_year}"
69
- record[:fiscal_year_number] = date.fiscal_year
70
- record[:holiday_indicator] = holiday_indicators.include?(date) ? 'Holiday' : 'Nonholiday'
71
- record[:weekday_indicator] = weekday_indicators[date.wday]
72
- record[:selling_season] = 'None'
73
- record[:major_event] = 'None'
74
- record[:sql_date_stamp] = date
75
-
76
- records << record
77
- date = date.tomorrow
78
- end
79
- records
35
+ (start_date..end_date).map { |date| record_from_date(date) }
36
+ end
37
+
38
+ private
39
+
40
+ # Returns a hash representing a record in the dimension. The values for each record are
41
+ # accessed by name.
42
+ def record_from_date(date)
43
+ time = date.to_time # need methods only available in Time
44
+ record = {}
45
+ record[:date] = time.strftime("%m/%d/%Y")
46
+ record[:full_date_description] = time.strftime("%B %d,%Y")
47
+ record[:day_of_week] = time.strftime("%A")
48
+ record[:day_in_week] = record[:day_of_week] # alias
49
+ #record[:day_number_in_epoch] = time.to_i / 24
50
+ #record[:week_number_in_epoch] = time.to_i / (24 * 7)
51
+ #record[:month_number_in_epoch] = time.to_i / (24 * 7 * 30)
52
+ record[:day_number_in_calendar_month] = time.day
53
+ record[:day_number_in_calendar_year] = time.yday
54
+ record[:day_number_in_fiscal_month] = time.day # should this be different from CY?
55
+ record[:day_number_in_fiscal_year] = time.fiscal_year_yday(fiscal_year_offset_month)
56
+ #record[:last_day_in_week_indicator] =
57
+ #record[:last_day_in_month_indicator] =
58
+ #record[:calendar_week_ending_date] =
59
+ record[:calendar_week] = "Week #{time.week}"
60
+ record[:calendar_week_number] = time.week
61
+ record[:calendar_week_number_in_year] = time.week # DEPRECATED
62
+ record[:calendar_month_name] = time.strftime("%B")
63
+ record[:calendar_month_number_in_year] = time.month # DEPRECATED
64
+ record[:calendar_month_number] = time.month
65
+ record[:calendar_year_month] = time.strftime("%Y-%m")
66
+ record[:calendar_quarter] = "Q#{time.quarter}"
67
+ record[:calendar_quarter_number] = time.quarter
68
+ record[:calendar_quarter_number_in_year] = time.quarter # DEPRECATED
69
+ record[:calendar_year_quarter] = "#{time.strftime('%Y')}-#{record[:calendar_quarter]}"
70
+ #record[:calendar_half_year] =
71
+ record[:calendar_year] = "#{time.year}"
72
+ record[:fiscal_week] = "FY Week #{time.fiscal_year_week(fiscal_year_offset_month)}"
73
+ record[:fiscal_week_number_in_year] = time.fiscal_year_week(fiscal_year_offset_month) # DEPRECATED
74
+ record[:fiscal_week_number] = time.fiscal_year_week(fiscal_year_offset_month)
75
+ record[:fiscal_month] = time.fiscal_year_month(fiscal_year_offset_month)
76
+ record[:fiscal_month_number] = time.fiscal_year_month(fiscal_year_offset_month)
77
+ record[:fiscal_month_number_in_year] = time.fiscal_year_month(fiscal_year_offset_month) # DEPRECATED
78
+ record[:fiscal_year_month] = "FY#{time.fiscal_year(fiscal_year_offset_month)}-" + time.fiscal_year_month(fiscal_year_offset_month).to_s.rjust(2, '0')
79
+ record[:fiscal_quarter] = "FY Q#{time.fiscal_year_quarter(fiscal_year_offset_month)}"
80
+ record[:fiscal_year_quarter] = "FY#{time.fiscal_year(fiscal_year_offset_month)}-Q#{time.fiscal_year_quarter(fiscal_year_offset_month)}"
81
+ record[:fiscal_quarter_number] = time.fiscal_year_quarter(fiscal_year_offset_month) # DEPRECATED
82
+ record[:fiscal_year_quarter_number] = time.fiscal_year_quarter(fiscal_year_offset_month)
83
+ #record[:fiscal_half_year] =
84
+ record[:fiscal_year] = "FY#{time.fiscal_year(fiscal_year_offset_month)}"
85
+ record[:fiscal_year_number] = time.fiscal_year(fiscal_year_offset_month)
86
+ record[:holiday_indicator] = holiday_indicators.include?(date) ? 'Holiday' : 'Nonholiday'
87
+ record[:weekday_indicator] = weekday_indicators[time.wday]
88
+ record[:selling_season] = 'None'
89
+ record[:major_event] = 'None'
90
+ record[:sql_date_stamp] = date
91
+
92
+ record
80
93
  end
81
94
  end
82
95
  end
83
- end
96
+ end
@@ -0,0 +1,31 @@
1
+ module ETL #:nodoc:
2
+ module Builder #:nodoc:
3
+ # Builder that creates a simple time dimension.
4
+ class TimeDimensionBuilder
5
+ def initialize
6
+ # Returns an array of hashes representing records in the dimension. The values for each record are
7
+ # accessed by name.
8
+ def build(options={})
9
+ records = []
10
+ 0.upto(23) do |t_hour|
11
+ 0.upto(59) do |t_minute|
12
+ 0.upto(59) do |t_second|
13
+ t_hour_string = t_hour.to_s.rjust(2, '0')
14
+ t_minute_string = t_minute.to_s.rjust(2, '0')
15
+ t_second_string = t_second.to_s.rjust(2, '0')
16
+ record = {}
17
+ record[:hour] = t_hour
18
+ record[:minute] = t_minute
19
+ record[:second] = t_second
20
+ record[:minute_description] = "#{t_hour_string}:#{t_minute_string}"
21
+ record[:full_description] = "#{t_hour_string}:#{t_minute_string}:#{t_second_string}"
22
+ records << record
23
+ end
24
+ end
25
+ end
26
+ records
27
+ end
28
+ end
29
+ end
30
+ end
31
+ end