sp-squealer 1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,3 @@
1
+ pkg/
2
+ tags
3
+ tmp/
data/.rvmrc ADDED
@@ -0,0 +1,6 @@
1
+ # These are the rubies that squealer has been tested on. Uncomment the one you want to use right now.
2
+ # 1.8.7-p249 segfaults postgres.
3
+ #
4
+ rvm use 1.8.7-p174@squealer
5
+ # rvm use 1.9.1@squealer #-p378
6
+ # rvm use 1.9.2@squealer #-preview3
data/.watchr ADDED
@@ -0,0 +1,27 @@
1
+ # ~/.vim/ftdetect/watchr.vim
2
+ #
3
+ # This should have only the following line in it:
4
+ #
5
+ # autocmd BufNewFile,BufRead *.watchr setf ruby
6
+ #
7
+ # This will enable vim to recognize this file as ruby code should you wish to
8
+ # edit it.
9
+ def run(cmd)
10
+ puts cmd
11
+ system cmd
12
+ end
13
+
14
+ def spec(file)
15
+ run "spec -O spec/spec.opts #{file}"
16
+ end
17
+
18
+ watch("spec/.*/*_spec\.rb") do |match|
19
+ p match[0]
20
+ spec(match[0])
21
+ end
22
+
23
+ watch("lib/(.*/.*)\.rb") do |match|
24
+ p match[1]
25
+ spec("spec/#{match[1]}_spec.rb")
26
+ end
27
+
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2010 Joshua A. Graham and authors
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,83 @@
1
+ # Squealer
2
+
3
+ ## Usage
4
+ See lib/example_squeal.rb for the example squeal.
5
+
6
+ To run standalone, simply make your data squeal thusly:
7
+
8
+ `ruby example_squeal.rb`
9
+
10
+ where the squeal script includes a `require 'squealer'`.
11
+
12
+ ## Rationale
13
+ * For some reason cranky old data guys think there exists no other than the relational theory for modelling data
14
+ * Josh Graham is crankier and in many cases older (although much better looking) than your cranky old DBA, so he remembers when RDBMS were not prolific, and one had to construct queries that explicitly traversed the network or hierarchical databases of the time (or even the indexed file systems). CODASYL, can you spell it?
15
+ * Although many business problems are best expressed in terms of a spreadsheet (a tuple space), and despite the somewhat disturbing fact that the majority of the world's critical commercial systems hinge on Excel spreadsheets, not every problem is best modelled this way
16
+ * MongoDB (along with a growing number of other noSQL == "not only SQL" databases) provides an alternate mechanism to store data in a way that naturally reflects the real-world problem. Simpler application code, higher performance and straight-forward scalabiltity are natural benefits of modelling in a way that most closely reflects reality
17
+ * At the inaugural QCon San Francisco in a discussion with Martin Fowler and Ola Bini, Josh postulated that ORMs had it the wrong way around: that the application should be persisting its data in a manner natural to it, and that external systems (like reporting and decision support systems - or even numbskull integration at the persistence layer) should bear the cost of mapping. With the huge efforts put into noSQL engines like MongoDB, neo4j, Redis, Hadoop, CouchDB, Memcached, et cetera, has come a rise in popularity. With this increased and broader usage comes people who are looking for tools to make these data stores more accessible. The application is no longer bearing the cost of mapping - it's now time for the ancillary and external systems to pick up the bill!
18
+ * squealer provides a simple, declarative language for mapping values from trees into relations. It is inherently flexibile by being an internal Ruby DSL, so any imperative traversal or mapping logic can be expressed
19
+ * It can be used both in bulk operations on many documents (e.g. a periodic batch job) or executed for one document asynchronously as part of an after_save method (e.g. via a Resque job). It is possible that more work may done on this event-driven approach, perhaps in the form of a squealerd, to reduce latency.
20
+ * For more on rationale, see my [blog post](http://grahamis.com/blog/2010/06/10/squealer-intro/ "Why squealer") and another from [Debasish Ghosh](http://debasishg.blogspot.com/2010/06/phase-shift-for-orm.html "A Phase Shift for the ORM")
21
+
22
+ ## Release Notes
23
+ ### v2.2
24
+ * Adds support for PostgreSQL database as an export target
25
+ * Uses EXPORT_DBMS environment variable to specify database adapter. `EXPORT_DMBS=mysql` or `EXPORT_DBMS=postgres`. MySQL is default if not specified
26
+ * Switched to using DataMapper's DataObjects SQL wrapper
27
+ * Removed the need for some typecasting and export schema restrictions (e.g. true/false maps to whatever is idiomatic for the specified DBMS)
28
+ * NB: The pg gem for PostgreSQL segfaults on Ruby 1.8.7-p249 so we've reverted to supporting up to 1.8.7-p174
29
+
30
+ ### v2.1
31
+ * Ruby 1.8.6 back-compatibility added. Using `eval "", binding, __FILE__, __LINE__` instead of `binding.eval`
32
+ * Target SQL script using backtick-quoted (MySQL) identifiers to avoid column-name / keyword conflict
33
+ * Automatically typecast Ruby `Boolean` (to integer), `Symbol` (to string), `Array` (to comma-seperated string)
34
+ * Improved handling and reporting of Target SQL errors
35
+ * Schaefer's Special "skewer" Script to reflect on Mongoid models and generate an initial squeal script and SQL schema DDL script. This tool is intended to build the _initial_ scripts only. It is extremely useful to get you going, but do think about the needs of the consumer of the export database, and adjust the scripts to suit. [How do you make something squeal? You skewer it!]
36
+
37
+ ### v2.0
38
+ * `Object#import` now wraps a MongoDB cursor to provide counters and timings. Only `each` is supported for now, however `source` takes optional conditions.
39
+ * Progress bar and summary.
40
+
41
+ ### v1.2.1
42
+ * `Object#import` syntax has changed. Now `import.source(collection).each` rather than `import.collection(collection).find({}).each`. `source` returns a MongoDB cursor like `find` does. See lib/example_squeal.rb for options.
43
+
44
+ ### v1.2
45
+ * `Object#target` verifies there is a variable in scope with the same name as the `table_name` being targetted, it must be a `Hash` and must have an `_id` key
46
+ * Block to `Object#assign` not required, infers value from source scope
47
+ * A block returning `nil` now uses `nil` as the value to `Object#assign`, rather than inferring value from source scope
48
+
49
+ ## Warning
50
+ Squealer is for _standalone_ operation. DO NOT use it directly from within your Ruby application. To make the DSL easy to use, we alter some core types:
51
+
52
+ * `Hash#method_missing` - You prefer dot notation. JSON uses dot notation. You are importing from a data store which represents collections as arrays of hashmaps. Dot notation for navigating those collections is convenient. If you use a field name that happens to be a method on Hash you will have to use index notation. (e.g. `kitten.toys` is good, however `kitten.freeze` is not good. Use `kitten['freeze']` instead.)
53
+ * `NilClass#each` - As you are importing from schemaless repositories and you may be trying to iterate on fields that contain embedded collections, if a specific parent does not contain one of those child collections, the driver will be returning `nil` as the value for that field. Having `NilClass#each` return a `[]` for a nil is convenient, semantically correct in this context, and removes the need for many `nil` checks in the block you provide to `Object#assign`
54
+ * `Object` - `#import`, `#export`, `#target`, and `#assign` "keywords" are provided for convenience
55
+ * You need to remember that all temporal data (date, time, datetime, timestamp, whatever) are all converted to a full UTC date and time. This means that if you want to use the simple assign expression (with no block), the target column must be defined as a SQL type that can automatically accept a full date and time. If you just want to store the date or time portion, or do any other manipulation, you must use a block to convert the source value.
56
+
57
+ ## It is a data mapper, it doesn't use one.
58
+ Squealer doesn't use your application classes. It doesn't use your ActiveRecord models. It doesn't use mongoid (as awesome as that is), mongodoc, or mongomapper. It's an ETL tool. It could even be called a HRM (Hashmap-Relational-Mapper), but only in hushed tones in the corner boothes of dark pubs. It directly uses the Ruby driver for MongoDB and the Ruby driver for MySQL.
59
+
60
+ ## Databases supported
61
+ For now, this is specifically for importing _MongoDB_ documents and exporting to either _MySQL_ or _PostgreSQL_.
62
+
63
+ ## Notes
64
+ Tested on Ruby 1.8.7(-p174) and Ruby 1.9.1(-p378)
65
+
66
+ The target SQL database _must_ have no foreign keys (because it can't rely on the primary key values and referential integrity is the responsibility of the source data store or the application that uses it).
67
+
68
+ The target SQL database must use a primary key of `CHAR(24)`. For now, we've assumed that column name is `id`. Each record's `id` value will get the source document `_id` value. There are some plans to make this more flexible. If you are actively requiring this, let Josh know.
69
+
70
+ It is assumed the target data will be quite denormalized - particularly that the hierarchy keys for embedded documents are flattened. This means that a document from `office.room.box` will be exported to a record containing the `id` for `office`, the `id` for `room` and the `id` for `box`.
71
+
72
+ It is assumed no indexes are present in the target database table (performance drag). You may want to create indexes for pulling data out of the database Squealer exports to. Run a SQL DDL script on your MySQL database after squealing to add the indexes. You should drop the indexes before squealing again.
73
+
74
+ The target row is inserted, or updated if present. When MySQL is the export DBMS, we are using it's non-standard `INSERT ... UPDATE ON DUPLICATE KEY` extended syntax to achieve this. For PostgreSQL, we use an UPDATE followed by an INSERT. Doing update-or-insert allows an idempotent event-driven update of exported data (e.g. through redis queues) as well as a bulk batch process.
75
+
76
+ ## Copyright
77
+
78
+ Copyright © 2010 Joshua A Graham and authors.
79
+
80
+ ## License
81
+
82
+ See [LICENSE](blob/master/LICENSE "License").
83
+
@@ -0,0 +1,44 @@
1
+ # Add your own tasks in files placed in lib/tasks ending in .rake,
2
+ # for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
3
+
4
+ require 'rake'
5
+ require 'spec/rake/spectask'
6
+ require 'rake/rdoctask'
7
+ require 'metric_fu'
8
+
9
+ begin
10
+ require 'jeweler'
11
+ Jeweler::Tasks.new do |gemspec|
12
+ gemspec.name = "squealer"
13
+ gemspec.summary = "Export document-oriented database to RDBMS"
14
+ gemspec.description = "A Ruby DSL for exporting MongoDB to MySQL or PostgreSQL. You don't need to install both, just one. Use EXPORT_DBMS=[mysql|postgres] environment variable to specify the appropriate adapter."
15
+ gemspec.email = "joshua.graham@grahamis.com"
16
+ gemspec.homepage = "http://github.com/delitescere/squealer/"
17
+ gemspec.authors = ["Josh Graham", "Durran Jordan", "Matt Yoho", "Bernerd Schaefer"]
18
+
19
+ gemspec.default_executable = "skewer"
20
+ gemspec.executables = ["skewer"]
21
+
22
+ # import DBMS
23
+ gemspec.add_dependency('mongo', '>= 0.18.3')
24
+ gemspec.add_dependency('bson_ext', '>= 1.0.1')
25
+
26
+ # export DBMS
27
+ gemspec.add_dependency('data_objects', '>= 0.10.2')
28
+ gemspec.add_dependency('do_mysql', '>= 0.10.2')
29
+ gemspec.add_dependency('do_postgres', '>= 0.10.2')
30
+
31
+ gemspec.add_development_dependency('rspec', '>= 1.3.0')
32
+ end
33
+ Jeweler::GemcutterTasks.new
34
+ rescue LoadError
35
+ puts "Jeweler not available. Install it with: gem install jeweler"
36
+ end
37
+
38
+ desc "Run all specs in spec directory (excluding plugin specs)"
39
+ Spec::Rake::SpecTask.new(:spec) do |t|
40
+ t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
41
+ t.spec_files = FileList['spec/**/*/*_spec.rb']
42
+ end
43
+
44
+ task :default => [:spec]
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 2.2.2
@@ -0,0 +1,161 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ model = ARGV[0]
4
+ abort "usage: skewer ModelName" unless model
5
+
6
+ require 'config/boot'
7
+ require 'config/environment'
8
+
9
+ model = Object.const_get(model)
10
+
11
+ unless defined?(Mongoid) && Mongoid::Document > model
12
+ abort "#{model} must be a Mongoid::Document"
13
+ end
14
+
15
+ write_schema = true
16
+ write_squeal = true
17
+
18
+ schema_filename = "#{model.name.underscore}_schema.sql"
19
+ squeal_filename = "#{model.name.underscore}_squeal.rb"
20
+
21
+ schema_exists = File.exists?(schema_filename)
22
+ squeal_exists = File.exists?(squeal_filename)
23
+
24
+ if schema_exists || squeal_exists
25
+ $stdout.print "#{schema_filename} already exists, overwrite? [Y/n] "
26
+ write_schema = $stdin.gets.chomp != "n"
27
+
28
+ $stdout.print "#{squeal_filename} already exists, overwrite? [Y/n] "
29
+ write_squeal = $stdin.gets.chomp != "n"
30
+ end
31
+
32
+ fields = model.fields.values.sort_by(&:name)
33
+ associations = model.associations
34
+
35
+ # SQL #
36
+ def create_table(model, parent = nil)
37
+ fields = model.fields.values.sort_by(&:name)
38
+ columns = []
39
+ columns << "`#{parent.name.underscore}_id` CHAR(24)" if parent
40
+
41
+ fields.each do |field|
42
+ mysql_type = case field.type.name
43
+ when "Boolean"
44
+ "BOOLEAN"
45
+ when "Time"
46
+ "TIMESTAMP NULL DEFAULT NULL"
47
+ when "Date"
48
+ "DATE"
49
+ when "Float"
50
+ "FLOAT"
51
+ when "Integer"
52
+ "INT"
53
+ else
54
+ "TEXT"
55
+ end
56
+ columns << "`#{field.name[0..63]}` #{mysql_type}"
57
+ end
58
+
59
+ table_name = if parent
60
+ "#{parent.name.underscore}_#{model.name.underscore}"
61
+ else
62
+ "#{model.name.underscore}"
63
+ end
64
+
65
+ table_sql = []
66
+ table_sql << "DROP TABLE IF EXISTS `#{table_name}`;"
67
+ table_sql << "CREATE TABLE `#{table_name}` (`id` CHAR(24) PRIMARY KEY);"
68
+ columns.each do |column|
69
+ table_sql << "ALTER TABLE `#{table_name}` ADD COLUMN #{column};"
70
+ end
71
+
72
+ table_sql.join("\n") + "\n"
73
+ end
74
+
75
+ # SQUEAL #
76
+ def create_squeal(model, indent=false, parents = [])
77
+ fields = model.fields.values.sort_by(&:name)
78
+
79
+ parent = parents.last
80
+ table_name = if parent
81
+ "#{parent.name.underscore}_#{model.name.underscore}"
82
+ else
83
+ "#{model.name.underscore}"
84
+ end
85
+
86
+ squeal = if parent
87
+ "#{parent.name.underscore}.#{model.name.tableize}.each do |#{model.name.underscore}|\n" \
88
+ " #{table_name} = #{model.name.underscore}"
89
+ else
90
+ "import.source(\"#{model.name.tableize}\").each do |#{model.name.underscore}|"
91
+ end
92
+
93
+ schemas = [create_table(model, parent)]
94
+
95
+ squeal << <<-EOS
96
+
97
+ target(:#{table_name}) do
98
+ EOS
99
+ if parent
100
+ squeal << <<-EOS
101
+ assign(:#{parent.name.underscore}_id)
102
+ EOS
103
+ end
104
+
105
+ fields.each do |field|
106
+ field_name = field.name
107
+ case
108
+ when %w(type target).include?(field.name)
109
+ value = " { #{table_name}['#{field.name}'] }"
110
+ when field.name.size > 64
111
+ field_name = field.name[0..63]
112
+ value = " { #{table_name}.#{field.name} }"
113
+ when field.name =~ /(.*)_id$/
114
+ value = " { #{table_name}.#{$1} }"
115
+ end
116
+ squeal << <<-EOS
117
+ assign(:#{field_name})#{value}
118
+ EOS
119
+ end
120
+
121
+ model.associations.values.each do |association|
122
+ begin
123
+ if [Mongoid::Associations::HasMany, Mongoid::Associations::HasOne].include?(association.association)
124
+ unless parents.include?(association.klass)
125
+ ruby, sql = create_squeal(association.klass, true, parents | [model])
126
+ squeal << "\n" + ruby + "\n"
127
+ schemas |= sql
128
+ end
129
+ end
130
+ rescue NameError
131
+ end
132
+ end
133
+
134
+ squeal << <<-EOS
135
+ end
136
+ end # #{table_name}
137
+ EOS
138
+ squeal.gsub!(/^/, " ") if indent
139
+ return squeal, schemas
140
+ end
141
+
142
+ squeal, schema = create_squeal(model)
143
+
144
+ if write_squeal
145
+ File.open(squeal_filename, "w") do |file|
146
+ file.write <<-EOS
147
+ require 'squealer'
148
+
149
+ import('mysql', 'localhost', 27017, 'development') # <--- Change this as needed
150
+ export('mysql', 'localhost', 'root', '', 'export') # <--- Change this as needed
151
+
152
+ EOS
153
+ file.write(squeal)
154
+ end
155
+ end
156
+
157
+ if write_schema
158
+ File.open(schema_filename, "w") do |file|
159
+ file.write(schema.join("\n"))
160
+ end
161
+ end
@@ -0,0 +1,59 @@
1
+ require 'squealer'
2
+
3
+ # connect to the source mongodb database
4
+ import 'localhost', 27017, 'development'
5
+
6
+ # connect to the target mysql database
7
+ export 'mysql', 'localhost', 'root', '', 'reporting_export'
8
+
9
+ import.source('users').each do |user|
10
+ target(:user) do
11
+ assign(:name) { "#{user.last_name.upcase}, #{user.first_name}" }
12
+ assign(:dob) { user.date_of_birth }
13
+ assign(:gender) #or# assign(:gender) { user.gender }
14
+
15
+ # You can normalize the export...
16
+ # home_address and work_address are a formatted string like: "661 W Lake St, Suite 3NE, Chicago IL, 60611, USA"
17
+ addresses = []
18
+ addresses << atomize_address(user.home_address) # atomize_address is some custom method of yours
19
+ addresses << atomize_address(user.work_address)
20
+ addresses.each do |address|
21
+ target(:address) do
22
+ assign(:street)
23
+ assign(:city)
24
+ assign(:state)
25
+ assign(:zip)
26
+ end
27
+ end
28
+
29
+ # You can denormalize the export...
30
+ # user.home_address = { street: '661 W Lake St', city: 'Chicago', state: 'IL' }
31
+ assign(:home_address) { flatten_address(user.home_address) } # flatten_address is some custom method of yours
32
+ assign(:work_address) { flatten_address(user.work_address) }
33
+
34
+ user.activities.each do |activity|
35
+ target(:activity) do
36
+ assign(:user_id) #or# assign(:user_id) { user._id }
37
+ assign(:name)
38
+ assign(:due_date)
39
+ end
40
+
41
+ activity.tasks.each do |task|
42
+ target(:task) do
43
+ assign(:user_id)
44
+ assign(:activity_id)
45
+ assign(:due_date)
46
+ end
47
+ end #activity.tasks
48
+ end #user.activities
49
+ end
50
+ end #collection("users")
51
+
52
+ # Here we use a procedural "join" on related collections to update a target...
53
+ import.source('organizations', {'disabled_date' => {'exists' => true}}).each do |organization|
54
+ import.source('users', {'organization_id' => organization.id}) do |user|
55
+ target(:user) do
56
+ assign(:disabled) { true }
57
+ end
58
+ end
59
+ end
@@ -0,0 +1,6 @@
1
+ require 'squealer/hash'
2
+ require 'squealer/object'
3
+
4
+ require 'squealer/database'
5
+ require 'squealer/progress_bar'
6
+ require 'squealer/target'
@@ -0,0 +1,87 @@
1
+ require 'singleton'
2
+
3
+ require 'mongo'
4
+ # data_objects required under export_to dependant on adapter requested in EXPORT_DBMS
5
+
6
+ module Squealer
7
+ class Database
8
+ include Singleton
9
+
10
+ def import_from(host, port, name)
11
+ @import_dbc = Mongo::Connection.new(host, port, :slave_ok => true).db(name)
12
+ @import_connection = Connection.new(@import_dbc)
13
+ end
14
+
15
+ def export_to(adapter, host, username, password, name)
16
+ require "do_#{adapter}"
17
+
18
+ @export_do.release if @export_do
19
+ creds = ""
20
+ creds << username if username
21
+ creds << ":#{password}" if password
22
+ at_host = ""
23
+ at_host << "#{creds}@" unless creds.empty?
24
+ at_host << host
25
+ @export_do = DataObjects::Connection.new("#{adapter}://#{at_host}/#{name}")
26
+ end
27
+
28
+ def import
29
+ @import_connection
30
+ end
31
+
32
+ def export
33
+ @export_do
34
+ end
35
+
36
+ def upsertable?
37
+ defined?(DataObjects::Mysql) && @export_do.is_a?(DataObjects::Mysql::Connection)
38
+ end
39
+
40
+ class Connection
41
+ attr_reader :collections
42
+
43
+ def initialize(dbc)
44
+ @dbc = dbc
45
+ @collections = {}
46
+ end
47
+
48
+ def source(collection, conditions = {}, &block)
49
+ source = Source.new(@dbc, collection)
50
+ @collections[collection] = source
51
+ source.source(conditions, &block)
52
+ end
53
+
54
+ def eval(string)
55
+ @dbc.eval(string)
56
+ end
57
+ end # Connection
58
+
59
+ class Source
60
+ attr_reader :counts, :cursor
61
+
62
+ def initialize(dbc, collection)
63
+ @counts = {:exported => 0, :imported => 0}
64
+ @collection = dbc.collection(collection)
65
+ end
66
+
67
+ def source(conditions)
68
+ @cursor = block_given? ? yield(@collection) : @collection.find(conditions)
69
+ @counts[:total] = cursor.count
70
+ @progress_bar = Squealer::ProgressBar.new(cursor.count)
71
+ self
72
+ end
73
+
74
+ def each
75
+ @progress_bar.start if @progress_bar
76
+ @cursor.each do |row|
77
+ @counts[:imported] += 1
78
+ yield row
79
+ @progress_bar.tick if @progress_bar
80
+ @counts[:exported] += 1
81
+ end
82
+ @progress_bar.finish if @progress_bar
83
+ end
84
+ end # Source
85
+
86
+ end
87
+ end