activewarehouse-etl 0.9.0 → 0.9.1

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -165,7 +165,7 @@
165
165
  0.8.4 - May 24, 2007
166
166
  * Added fix for backslash in file writer
167
167
 
168
- 0.9.0 -
168
+ 0.9.0 - August 9, 2007
169
169
  * Added support for batch processing through .ebf files. These files are
170
170
  essentially control files that apply settings to an entire ETL process.
171
171
  * Implemented support for screen blocks. These blocks can be used to test
@@ -175,4 +175,24 @@
175
175
  connection information in the control files.
176
176
  * Implemented temp table support throughout.
177
177
  * DateDimensionBuilder now included in ActiveWarehouse ETL directly.
178
- * Time calculations for fiscal year now included in ActiveWarehouse ETL.
178
+ * Time calculations for fiscal year now included in ActiveWarehouse ETL.
179
+
180
+ 0.9.1 -
181
+ * SQLResolver now uses ETL::Engine.table so it may utilize temp tables. (aeden)
182
+ * Added Thibaut Barrère's encode processor.
183
+ * Added MockSource and MockDestination test helpers (thbar)
184
+ * Added the block processor. Can call a block once (pre/post processor)
185
+ or once for each row (after_read/before_write row processor) (thbar)
186
+ * Changed temp table to use new AdapterExtension copy_table method (aeden)
187
+ * Added bin/etl.cmd windows batch - just add the bin folder to your PATH
188
+ and it will let you call etl on an unpacked/pistoned version of AW-ETL (thbar)
189
+ * Upgraded to support Rails 2.1. No longer compatible with older versions of Rails.
190
+ * Added ETL::Builder::TimeDimensionBuilder
191
+ * Added :default option to ForeignKeyLookupTransform that will be used if no
192
+ foreign key is found.
193
+ * Added :cache option to ForeignKeyLookupTransform that will preload the FK
194
+ mappings if the underlying resolver supports it. Currently supported by
195
+ SQLResolver.
196
+ * A Class extending ETL::Transform::Transform may now be passed as a transformer.
197
+ For example, in the control file you would define the transform as:
198
+ transform :a_field, MyTransform, {:option1 => 'option1'}.
data/README CHANGED
@@ -1,5 +1,14 @@
1
1
  Ruby Extract-Transform-Load (ETL) tool.
2
2
 
3
+ == Requirements
4
+
5
+ * Ruby 1.8.5 or higher
6
+ * Rubygems
7
+
8
+ == Online Documentation
9
+
10
+ Available at http://activewarehouse.rubyforge.org/docs/activewarehouse-etl.html
11
+
3
12
  == Features
4
13
 
5
14
  Current supported features:
@@ -67,6 +76,9 @@ Command line options:
67
76
  == Control File Examples
68
77
  Control file examples can be found in the examples directory.
69
78
 
79
+ == Running Tests
80
+ The tests require Shoulda 1.x.
81
+
70
82
  == Feedback
71
83
  This is a work in progress. Comments should be made on the
72
84
  activewarehouse-discuss mailing list at the moment. Contributions are always
data/Rakefile CHANGED
@@ -7,16 +7,13 @@ require 'rake/contrib/rubyforgepublisher'
7
7
 
8
8
  require File.join(File.dirname(__FILE__), 'lib/etl', 'version')
9
9
 
10
- PKG_BUILD = ENV['PKG_BUILD'] ? '.' + ENV['PKG_BUILD'] : ''
11
- PKG_NAME = 'activewarehouse-etl'
12
- PKG_VERSION = ETL::VERSION::STRING + PKG_BUILD
13
- PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
14
- PKG_DESTINATION = ENV["PKG_DESTINATION"] || "../#{PKG_NAME}"
15
-
16
- RELEASE_NAME = "REL #{PKG_VERSION}"
17
-
18
- RUBY_FORGE_PROJECT = "activewarehouse"
19
- RUBY_FORGE_USER = "aeden"
10
+ module AWETL
11
+ PKG_BUILD = ENV['PKG_BUILD'] ? '.' + ENV['PKG_BUILD'] : ''
12
+ PKG_NAME = 'activewarehouse-etl'
13
+ PKG_VERSION = ETL::VERSION::STRING + PKG_BUILD
14
+ PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
15
+ PKG_DESTINATION = ENV["PKG_DESTINATION"] || "../#{PKG_NAME}"
16
+ end
20
17
 
21
18
  desc 'Default: run unit tests.'
22
19
  task :default => :test
@@ -45,54 +42,62 @@ namespace :rcov do
45
42
  mkdir 'coverage' unless File.exist?('coverage')
46
43
  rcov = "rcov --aggregate coverage.data --text-summary -Ilib"
47
44
  system("#{rcov} test/*_test.rb")
48
- system("open coverage/index.html") if PLATFORM['darwin']
45
+ # system("open coverage/index.html") if PLATFORM['darwin']
49
46
  end
50
47
  end
51
48
 
52
- PKG_FILES = FileList[
53
- 'CHANGELOG',
54
- 'LICENSE',
55
- 'README',
56
- 'TODO',
57
- 'Rakefile',
58
- 'bin/**/*',
59
- 'doc/**/*',
60
- 'lib/**/*',
61
- 'examples/**/*',
62
- ] - [ 'test' ]
63
-
64
- spec = Gem::Specification.new do |s|
65
- s.name = 'activewarehouse-etl'
66
- s.version = PKG_VERSION
67
- s.summary = "Pure Ruby ETL package."
68
- s.description = <<-EOF
69
- ActiveWarehouse ETL is a pure Ruby Extract-Transform-Load application for loading data into a database.
70
- EOF
71
-
72
- s.add_dependency('rake', '>= 0.7.1')
73
- s.add_dependency('activesupport', '>= 1.3.1')
74
- s.add_dependency('activerecord', '>= 1.14.4')
75
- s.add_dependency('fastercsv', '>= 1.2.0')
76
- s.add_dependency('adapter_extensions', '>= 0.1.0')
77
-
78
- s.rdoc_options << '--exclude' << '.'
79
- s.has_rdoc = false
80
-
81
- s.files = PKG_FILES.to_a.delete_if {|f| f.include?('.svn')}
82
- s.require_path = 'lib'
83
-
84
- s.bindir = "bin" # Use these for applications.
85
- s.executables = ['etl']
86
- s.default_executable = "etl"
87
-
88
- s.author = "Anthony Eden"
89
- s.email = "anthonyeden@gmail.com"
90
- s.homepage = "http://activewarehouse.rubyforge.org/etl"
91
- s.rubyforge_project = "activewarehouse"
49
+ # Gem Spec
50
+
51
+ module AWETL
52
+ def self.package_files(package_prefix)
53
+ FileList[
54
+ "#{package_prefix}CHANGELOG",
55
+ "#{package_prefix}LICENSE",
56
+ "#{package_prefix}README",
57
+ "#{package_prefix}TODO",
58
+ "#{package_prefix}Rakefile",
59
+ "#{package_prefix}bin/**/*",
60
+ "#{package_prefix}doc/**/*",
61
+ "#{package_prefix}lib/**/*",
62
+ "#{package_prefix}examples/**/*",
63
+ ] - [ "#{package_prefix}test" ]
64
+ end
65
+
66
+ def self.spec(package_prefix = '')
67
+ Gem::Specification.new do |s|
68
+ s.name = 'activewarehouse-etl'
69
+ s.version = AWETL::PKG_VERSION
70
+ s.summary = "Pure Ruby ETL package."
71
+ s.description = <<-EOF
72
+ ActiveWarehouse ETL is a pure Ruby Extract-Transform-Load application for loading data into a database.
73
+ EOF
74
+
75
+ s.add_dependency('rake', '>= 0.7.1')
76
+ s.add_dependency('activesupport', '>= 1.3.1')
77
+ s.add_dependency('activerecord', '>= 1.14.4')
78
+ s.add_dependency('fastercsv', '>= 1.2.0')
79
+ s.add_dependency('adapter_extensions', '>= 0.1.0')
80
+
81
+ s.rdoc_options << '--exclude' << '.'
82
+ s.has_rdoc = false
83
+
84
+ s.files = package_files(package_prefix).to_a.delete_if {|f| f.include?('.svn')}
85
+ s.require_path = 'lib'
86
+
87
+ s.bindir = "#{package_prefix}bin" # Use these for applications.
88
+ s.executables = ['etl']
89
+ s.default_executable = "etl"
90
+
91
+ s.author = "Anthony Eden"
92
+ s.email = "anthonyeden@gmail.com"
93
+ s.homepage = "http://activewarehouse.rubyforge.org/etl"
94
+ s.rubyforge_project = "activewarehouse"
95
+ end
96
+ end
92
97
  end
93
98
 
94
- Rake::GemPackageTask.new(spec) do |pkg|
95
- pkg.gem_spec = spec
99
+ Rake::GemPackageTask.new(AWETL.spec) do |pkg|
100
+ pkg.gem_spec = AWETL.spec
96
101
  pkg.need_tar = true
97
102
  pkg.need_zip = true
98
103
  end
@@ -112,10 +117,10 @@ task :lines do
112
117
  codelines += 1
113
118
  end
114
119
  puts "L: #{sprintf("%4d", lines)}, LOC #{sprintf("%4d", codelines)} | #{file_name}"
115
-
120
+
116
121
  total_lines += lines
117
122
  total_codelines += codelines
118
-
123
+
119
124
  lines, codelines = 0, 0
120
125
  end
121
126
 
@@ -127,7 +132,7 @@ task :release => [ :package ] do
127
132
  `rubyforge login`
128
133
 
129
134
  for ext in %w( gem tgz zip )
130
- release_command = "rubyforge add_release activewarehouse #{PKG_NAME} 'REL #{PKG_VERSION}' pkg/#{PKG_NAME}-#{PKG_VERSION}.#{ext}"
135
+ release_command = "rubyforge add_release activewarehouse #{AWETL::PKG_NAME} 'REL #{AWETL::PKG_VERSION}' pkg/#{AWETL::PKG_NAME}-#{AWETL::PKG_VERSION}.#{ext}"
131
136
  puts release_command
132
137
  system(release_command)
133
138
  end
@@ -143,6 +148,6 @@ task :reinstall => [:package] do
143
148
  windows = RUBY_PLATFORM =~ /mswin/
144
149
  sudo = windows ? '' : 'sudo'
145
150
  gem = windows ? 'gem.bat' : 'gem'
146
- `#{sudo} #{gem} uninstall -x -i #{PKG_NAME}`
147
- `#{sudo} #{gem} install pkg/#{PKG_NAME}-#{PKG_VERSION}`
148
- end
151
+ `#{sudo} #{gem} uninstall #{AWETL::PKG_NAME} -x`
152
+ `#{sudo} #{gem} install pkg/#{AWETL::PKG_NAME}-#{AWETL::PKG_VERSION}`
153
+ end
data/bin/etl CHANGED
File without changes
@@ -0,0 +1,8 @@
1
+ @echo off
2
+
3
+ rem The purpose of this Windows script is to let you use the etl command line with a non-gem version of AW-ETL (eg: unpacked gem, pistoned trunk).
4
+ rem Just add the current folder on top of your PATH variable to use it instead of the etl command provided with the gem release.
5
+
6
+ rem %~dp0 returns the absolute path where the current script is. We just append 'etl' to it, and forward all the arguments with %*
7
+
8
+ ruby "%~dp0etl" %*
@@ -3,4 +3,14 @@ etl_execution:
3
3
  username: root
4
4
  host: localhost
5
5
  database: etl_execution
6
- encoding: utf8
6
+ encoding: utf8
7
+ datawarehouse:
8
+ adapter: mysql
9
+ username: root
10
+ host: localhost
11
+ database: datawarehouse_development
12
+ operational:
13
+ adapter: mysql
14
+ username: root
15
+ host: localhost
16
+ database: operational_production
data/lib/etl.rb CHANGED
@@ -31,29 +31,15 @@ require 'erb'
31
31
 
32
32
  require 'rubygems'
33
33
 
34
- unless Kernel.respond_to?(:gem)
35
- Kernel.send :alias_method, :gem, :require_gem
34
+ unless defined?(REXML::VERSION)
35
+ require 'rexml/rexml'
36
+ REXML::VERSION = REXML::Version
36
37
  end
37
38
 
38
- unless defined?(ActiveSupport)
39
- gem 'activesupport'
40
- require 'active_support'
41
- end
42
-
43
- unless defined?(ActiveRecord)
44
- gem 'activerecord'
45
- require 'active_record'
46
- end
47
-
48
- unless defined?(AdapterExtensions)
49
- gem 'adapter_extensions'
50
- require 'adapter_extensions'
51
- end
52
-
53
- unless defined?(FasterCSV)
54
- gem 'fastercsv'
55
- require 'faster_csv'
56
- end
39
+ require 'active_support'
40
+ require 'active_record'
41
+ require 'adapter_extensions'
42
+ require 'faster_csv'
57
43
 
58
44
  $:.unshift(File.dirname(__FILE__))
59
45
 
@@ -79,6 +65,8 @@ module ETL #:nodoc:
79
65
  end
80
66
  class DefinitionError < ControlError #:nodoc:
81
67
  end
68
+ class ConfigurationError < ControlError #:nodoc:
69
+ end
82
70
  class MismatchError < ETLError #:nodoc:
83
71
  end
84
72
  class ResolverError < ETLError #:nodoc:
@@ -1 +1,2 @@
1
- require 'etl/builder/date_dimension_builder'
1
+ require 'etl/builder/date_dimension_builder'
2
+ require 'etl/builder/time_dimension_builder'
@@ -11,6 +11,9 @@ module ETL #:nodoc:
11
11
 
12
12
  # Define any holiday indicators
13
13
  attr_accessor :holiday_indicators
14
+
15
+ # Add offset month for fiscal year
16
+ attr_accessor :fiscal_year_offset_month
14
17
 
15
18
  # Define the weekday indicators. The default array begins on Sunday and goes to Saturday.
16
19
  cattr_accessor :weekday_indicators
@@ -20,64 +23,74 @@ module ETL #:nodoc:
20
23
  #
21
24
  # * <tt>start_date</tt>: The start date. Defaults to 5 years ago from today.
22
25
  # * <tt>end_date</tt>: The end date. Defaults to now.
23
- def initialize(start_date=Time.now.years_ago(5), end_date=Time.now)
24
- @start_date = start_date
25
- @end_date = end_date
26
+ def initialize(start_date=Time.now.years_ago(5), end_date=Time.now, fiscal_year_offset_month=10)
27
+ @start_date = start_date.to_date
28
+ @end_date = end_date.to_date
29
+ @fiscal_year_offset_month = fiscal_year_offset_month.to_i
26
30
  @holiday_indicators = []
27
31
  end
28
32
 
29
- # Returns an array of hashes representing records in the dimension. The values for each record are
30
- # accessed by name.
33
+ # Returns an array of hashes representing records in the dimension.
31
34
  def build(options={})
32
- records = []
33
- date = start_date.to_time
34
- while date <= end_date.to_time
35
- record = {}
36
- record[:date] = date.strftime("%m/%d/%Y")
37
- record[:full_date_description] = date.strftime("%B %d,%Y")
38
- record[:day_of_week] = date.strftime("%A")
39
- #record[:day_number_in_epoch] = date.to_i / 24
40
- #record[:week_number_in_epoch] = date.to_i / (24 * 7)
41
- #record[:month_number_in_epoch] = date.to_i / (24 * 7 * 30)
42
- record[:day_number_in_calendar_month] = date.day
43
- record[:day_number_in_calendar_year] = date.yday
44
- record[:day_number_in_fiscal_month] = date.day # should this be different from CY?
45
- record[:day_number_in_fiscal_year] = date.fiscal_year_yday
46
- #record[:last_day_in_week_indicator] =
47
- #record[:last_day_in_month_indicator] =
48
- #record[:calendar_week_ending_date] =
49
- record[:calendar_week] = "Week #{date.week}"
50
- record[:calendar_week_number_in_year] = date.week
51
- record[:calendar_month_name] = date.strftime("%B")
52
- record[:calendar_month_number_in_year] = date.month
53
- record[:calendar_year_month] = date.strftime("%Y-%m")
54
- record[:calendar_quarter] = "Q#{date.quarter}"
55
- record[:calendar_quarter_number_in_year] = date.quarter
56
- record[:calendar_year_quarter] = "#{date.strftime('%Y')}-#{record[:calendar_quarter]}"
57
- #record[:calendar_half_year] =
58
- record[:calendar_year] = "#{date.year}"
59
- record[:fiscal_week] = "FY Week #{date.fiscal_year_week}"
60
- record[:fiscal_week_number_in_year] = date.fiscal_year_week
61
- record[:fiscal_month] = date.fiscal_year_month
62
- record[:fiscal_month_number_in_year] = date.fiscal_year_month
63
- record[:fiscal_year_month] = "FY#{date.fiscal_year}-" + date.fiscal_year_month.to_s.rjust(2, '0')
64
- record[:fiscal_quarter] = "FY Q#{date.fiscal_year_quarter}"
65
- record[:fiscal_year_quarter] = "FY#{date.fiscal_year}-Q#{date.fiscal_year_quarter}"
66
- record[:fiscal_year_quarter_number] = date.fiscal_year_quarter
67
- #record[:fiscal_half_year] =
68
- record[:fiscal_year] = "FY#{date.fiscal_year}"
69
- record[:fiscal_year_number] = date.fiscal_year
70
- record[:holiday_indicator] = holiday_indicators.include?(date) ? 'Holiday' : 'Nonholiday'
71
- record[:weekday_indicator] = weekday_indicators[date.wday]
72
- record[:selling_season] = 'None'
73
- record[:major_event] = 'None'
74
- record[:sql_date_stamp] = date
75
-
76
- records << record
77
- date = date.tomorrow
78
- end
79
- records
35
+ (start_date..end_date).map { |date| record_from_date(date) }
36
+ end
37
+
38
+ private
39
+
40
+ # Returns a hash representing a record in the dimension. The values for each record are
41
+ # accessed by name.
42
+ def record_from_date(date)
43
+ time = date.to_time # need methods only available in Time
44
+ record = {}
45
+ record[:date] = time.strftime("%m/%d/%Y")
46
+ record[:full_date_description] = time.strftime("%B %d,%Y")
47
+ record[:day_of_week] = time.strftime("%A")
48
+ record[:day_in_week] = record[:day_of_week] # alias
49
+ #record[:day_number_in_epoch] = time.to_i / 24
50
+ #record[:week_number_in_epoch] = time.to_i / (24 * 7)
51
+ #record[:month_number_in_epoch] = time.to_i / (24 * 7 * 30)
52
+ record[:day_number_in_calendar_month] = time.day
53
+ record[:day_number_in_calendar_year] = time.yday
54
+ record[:day_number_in_fiscal_month] = time.day # should this be different from CY?
55
+ record[:day_number_in_fiscal_year] = time.fiscal_year_yday(fiscal_year_offset_month)
56
+ #record[:last_day_in_week_indicator] =
57
+ #record[:last_day_in_month_indicator] =
58
+ #record[:calendar_week_ending_date] =
59
+ record[:calendar_week] = "Week #{time.week}"
60
+ record[:calendar_week_number] = time.week
61
+ record[:calendar_week_number_in_year] = time.week # DEPRECATED
62
+ record[:calendar_month_name] = time.strftime("%B")
63
+ record[:calendar_month_number_in_year] = time.month # DEPRECATED
64
+ record[:calendar_month_number] = time.month
65
+ record[:calendar_year_month] = time.strftime("%Y-%m")
66
+ record[:calendar_quarter] = "Q#{time.quarter}"
67
+ record[:calendar_quarter_number] = time.quarter
68
+ record[:calendar_quarter_number_in_year] = time.quarter # DEPRECATED
69
+ record[:calendar_year_quarter] = "#{time.strftime('%Y')}-#{record[:calendar_quarter]}"
70
+ #record[:calendar_half_year] =
71
+ record[:calendar_year] = "#{time.year}"
72
+ record[:fiscal_week] = "FY Week #{time.fiscal_year_week(fiscal_year_offset_month)}"
73
+ record[:fiscal_week_number_in_year] = time.fiscal_year_week(fiscal_year_offset_month) # DEPRECATED
74
+ record[:fiscal_week_number] = time.fiscal_year_week(fiscal_year_offset_month)
75
+ record[:fiscal_month] = time.fiscal_year_month(fiscal_year_offset_month)
76
+ record[:fiscal_month_number] = time.fiscal_year_month(fiscal_year_offset_month)
77
+ record[:fiscal_month_number_in_year] = time.fiscal_year_month(fiscal_year_offset_month) # DEPRECATED
78
+ record[:fiscal_year_month] = "FY#{time.fiscal_year(fiscal_year_offset_month)}-" + time.fiscal_year_month(fiscal_year_offset_month).to_s.rjust(2, '0')
79
+ record[:fiscal_quarter] = "FY Q#{time.fiscal_year_quarter(fiscal_year_offset_month)}"
80
+ record[:fiscal_year_quarter] = "FY#{time.fiscal_year(fiscal_year_offset_month)}-Q#{time.fiscal_year_quarter(fiscal_year_offset_month)}"
81
+ record[:fiscal_quarter_number] = time.fiscal_year_quarter(fiscal_year_offset_month) # DEPRECATED
82
+ record[:fiscal_year_quarter_number] = time.fiscal_year_quarter(fiscal_year_offset_month)
83
+ #record[:fiscal_half_year] =
84
+ record[:fiscal_year] = "FY#{time.fiscal_year(fiscal_year_offset_month)}"
85
+ record[:fiscal_year_number] = time.fiscal_year(fiscal_year_offset_month)
86
+ record[:holiday_indicator] = holiday_indicators.include?(date) ? 'Holiday' : 'Nonholiday'
87
+ record[:weekday_indicator] = weekday_indicators[time.wday]
88
+ record[:selling_season] = 'None'
89
+ record[:major_event] = 'None'
90
+ record[:sql_date_stamp] = date
91
+
92
+ record
80
93
  end
81
94
  end
82
95
  end
83
- end
96
+ end
@@ -0,0 +1,31 @@
1
+ module ETL #:nodoc:
2
+ module Builder #:nodoc:
3
+ # Builder that creates a simple time dimension.
4
+ class TimeDimensionBuilder
5
+ def initialize
6
+ # Returns an array of hashes representing records in the dimension. The values for each record are
7
+ # accessed by name.
8
+ def build(options={})
9
+ records = []
10
+ 0.upto(23) do |t_hour|
11
+ 0.upto(59) do |t_minute|
12
+ 0.upto(59) do |t_second|
13
+ t_hour_string = t_hour.to_s.rjust(2, '0')
14
+ t_minute_string = t_minute.to_s.rjust(2, '0')
15
+ t_second_string = t_second.to_s.rjust(2, '0')
16
+ record = {}
17
+ record[:hour] = t_hour
18
+ record[:minute] = t_minute
19
+ record[:second] = t_second
20
+ record[:minute_description] = "#{t_hour_string}:#{t_minute_string}"
21
+ record[:full_description] = "#{t_hour_string}:#{t_minute_string}:#{t_second_string}"
22
+ records << record
23
+ end
24
+ end
25
+ end
26
+ records
27
+ end
28
+ end
29
+ end
30
+ end
31
+ end