crhym3-imexport 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (4) hide show
  1. data/README.rdoc +216 -1
  2. data/Rakefile +1 -1
  3. data/imexport.gemspec +2 -2
  4. metadata +2 -2
data/README.rdoc CHANGED
@@ -1,4 +1,219 @@
1
1
  == Description
2
2
 
3
- TODO
3
+ This library imports data from a text file (created by
4
+ mysql -E -e "SELECT something FROM somewhere, somewhere_else ..." > data.dump
5
+ only for the moment) into your rails app, meaning it only works with your
6
+ ActiveRecord models.
7
+
8
+ I had this little problem. There was an old web app written in java and a new
9
+ rails app. Although both were using mysql, "old" and "new" DBs were running on
10
+ separated servers and they had quite a different data structures. Still,
11
+ I wanted to keep some pieces of data synchronized quite frequently, at least
12
+ for a not-so-short transition period.
13
+
14
+ I had a couple options to consider:
15
+
16
+ * Simple "INSERT INTO new_DB (...) SELECT original_data FROM old_DB" or similar
17
+ to that. Cons: I couldn't do it in one sql expression as the data structures
18
+ were too different; I'd have to take care of attributes such as +updated_at+
19
+ and +created_at+ manually, let alone models validations.
20
+
21
+ * Create models for old data structures in the new rails app and do something
22
+ like this in a rake task:
23
+
24
+ OldModel.all do |old|
25
+ NewModel.create :attr1 => old.attr1, :attr2 => old...
26
+ end
27
+
28
+ Cons: I didn't want to mess up the new rails app with a bunch of models
29
+ I'd never use except for synchronization.
30
+
31
+ * Dump +original_data+ into a CSV format and the use +CSV+ or +FasterCSV+.
32
+ Actually this was the choice I opted for from the beginning. The problem
33
+ here was that I had a really messed up data sometimes: lots of copy&paste
34
+ from software like MS Word, etc. +FasterCSV+ was throwing Malformed exceptions
35
+ too often and +CSV+ sometimes wasn't able to recognize end of row / beginning
36
+ of a new row. It wasn't their fault, it was my data bad quality. So, I decided
37
+ to write this little gem.
38
+
39
+ == How it's different from CSV and FasterCSV
40
+
41
+ First off, this library isn't meant to replace either of them. It works with
42
+ different text formats (not CSV) and doesn't do just file parsing.
43
+
44
+ Consider this snippet created by
45
+ mysql -E -e "SELECT title AS COLUMN_title, speaker AS COLUMN_speaker, abstract AS COLUMN_abstract FROM Seminars" > seminars.txt:
46
+
47
+ *************************** 7. row ***************************
48
+ COLUMN_title: Conditional XPath = Codd Complete XPath
49
+ COLUMN_speaker: John Smith
50
+ COLUMN_abstract: This paper positively solves the following problem: Is there a natural
51
+ expansion of XPath 1.0 in which every first order query over
52
+ XML document tree models is expressible?
53
+ We give two necessary and sufficient conditions on XPath like
54
+
55
+ This library creates a new model object, recognizes each +COLUMN_attr+ and
56
+ tries to set attribute of that object, like model.title = COLUMN_title,
57
+ model.speaker = COLUMN_speaker and model.abstract = COLUMN_abstract.
58
+
59
+ It then runs model validations (model.valid?) and does either model.save or
60
+ model.update_attributes(attrs_hash).
61
+
62
+ == Usage
63
+
64
+ Say, you have a model called +Seminar+ with the following attributes:
65
+
66
+ create_table "seminars", :force => true do |t|
67
+ t.string "title",
68
+ t.text "abstract",
69
+ t.datetime "date",
70
+ t.text "notes"
71
+ t.datetime "created_at"
72
+ t.datetime "updated_at"
73
+ t.boolean "published"
74
+ end
75
+
76
+ Consider a snippet of a DB text dump similar to the previous example.
77
+ Let's just add few more columns:
78
+
79
+ *************************** 7. row ***************************
80
+ COLUMN_title: Conditional XPath = Codd Complete XPath
81
+ COLUMN_date_time: 2004-11-30T15:30:00
82
+ COLUMN_publish: 1
83
+ COLUMN_abstract: This paper positively solves the following problem: Is there a natural
84
+ expansion of XPath 1.0 in which every first order query over
85
+ XML document tree models is expressible?
86
+
87
+ Define a rake task in your rails app and require +imexport+, e.g.
88
+
89
+ namespace :db do
90
+ namespace :import do
91
+ task :seminars => :environment do
92
+ require imexport
93
+ end
94
+ end
95
+ end
96
+
97
+ Now, define columns-to-model-attributes:
98
+
99
+ COLUMNS_TO_MODEL_MAP = {
100
+ 'date_time' => { :date => Proc.new do |datetime|
101
+ # YYYY-MM-DDTHH:MM:SS
102
+ DateTime.strptime(datetime, '%FT%T')
103
+ end },
104
+ 'publish' => { :published => Proc.new do |val|
105
+ val.to_i == 1
106
+ end }
107
+ }
108
+
109
+ As you noticed we didn't define mapping for +title+ and +abstract+ as they
110
+ are simple strings and don't need any special conversion. Plus, column names
111
+ are the same as model attributes.
112
+
113
+ Lastly, let's do the sync:
114
+
115
+ ImExport::import(ENV['FROM_FILE'], {
116
+ :class_name => 'Seminar',
117
+ :find_by => 'title',
118
+ :db_columns_prefix => 'COLUMN_',
119
+ :map => COLUMNS_TO_MODEL_MAP})
120
+
121
+ You would run the task in this way:
122
+
123
+ rake db:import:seminars FROM_FILE=/path/to/seminars.txt
124
+
125
+ and your +seminars+ table is synchronized.
126
+
127
+ So, the complete rake task would look like this:
128
+
129
+ namespace :db do
130
+ namespace :import do
131
+ task :seminars => :environment do
132
+ require imexport
133
+
134
+ COLUMNS_TO_MODEL_MAP = {
135
+ 'date_time' => { :date => Proc.new do |datetime|
136
+ # YYYY-MM-DDTHH:MM:SS
137
+ DateTime.strptime(datetime, '%FT%T')
138
+ end },
139
+ 'publish' => { :published => Proc.new do |val|
140
+ val.to_i == 1
141
+ end }
142
+ }
143
+
144
+ ImExport::import(ENV['FROM_FILE'], {
145
+ :class_name => 'Seminar',
146
+ :find_by => 'title',
147
+ :db_columns_prefix => 'COLUMN_',
148
+ :map => COLUMNS_TO_MODEL_MAP})
149
+ end
150
+ end
151
+ end
152
+
153
+ Also, you can pass a block to ImExport::import. In that case you'll have to
154
+ call model.save or model.update_attributes(...) yourself:
155
+
156
+ ImExport::import(ENV['FROM_FILE'], {
157
+ :class_name => 'Seminar',
158
+ :find_by => 'title',
159
+ :db_columns_prefix => 'COLUMN_',
160
+ :map => COLUMNS_TO_MODEL_MAP}) do |seminar|
161
+
162
+ # do something with seminar object here, e.g.
163
+ # seminar.save
164
+ puts "---> #{seminar.inspect}"
165
+ end
166
+
167
+ === Options for ImExport::import
168
+
169
+
170
+ +class_name+::
171
+ "String" or :symbol. ActiveRecord model defined in your rails app.
172
+
173
+ +find_by+::
174
+ "String" or :symbol.
175
+ This is how ImExport will recognize whether it should do model.save or
176
+ model.update_attributes(...). Considering previous example it would do
177
+ seminar.save if seminar.find_by_title(...) returns nil or
178
+ seminar.update_attributes(...) otherwise.
179
+
180
+ +db_columns_prefix+::
181
+ "String".
182
+ Column name prefix that should be skipped while looking for the corresponding
183
+ model attribute name. Againg, considering previous example, +COLUMN_title+
184
+ actually means +title+ attribute of +Seminar+ model.
185
+
186
+ +map+::
187
+ Hash.
188
+ Tells ImExport how to map column attributes with their corresponding model
189
+ attributes. Don't add +db_columns_prefix+ to the colum names here,
190
+ it is already cleaned up.
191
+
192
+ Also, you don't really have to define mapping for attributes that have the
193
+ same names as columns in the text file to be parsed, they will be recognized
194
+ and set automatically.
195
+
196
+ Every item in this Hash can be defined in one of the following ways:
197
+
198
+ 'column_name' => :symbol
199
+ _Behavior_: model.symbol = value_of_column_name
200
+
201
+ 'column_name' => { :symbol => Proc.new { |column_value| ... } }
202
+ _Behavior_: model.symbol = result_of_Proc_call where Proc's only argument is
203
+ the column value.
204
+
205
+ 'column_name' => Proc.new { |column_value, model_object| ... }
206
+ _Behavior_: Proc called with two arguments, column value and object-to-be-saved itself.
207
+ This is the only case where your code should take of updating
208
+ model's attribute(s) since ImExport can't guess the attribute name.
209
+
210
+ == How to install
211
+
212
+ sudo gem install crhym3-imexport
213
+
214
+ == License
215
+
216
+ Copyright (c) 2009 Alex Vagin, released under the MIT license.
217
+
218
+ mailto:alex@digns.com
4
219
 
data/Rakefile CHANGED
@@ -2,7 +2,7 @@ require 'rubygems'
2
2
  require 'rake'
3
3
  require 'echoe'
4
4
 
5
- Echoe.new('imexport', '0.1.0') do |p|
5
+ Echoe.new('imexport', '0.1.1') do |p|
6
6
  p.description = "Simple import from a text file generated by mysql -E ..."
7
7
  p.url = "http://github.com/crhym3/imexport"
8
8
  p.author = "alex"
data/imexport.gemspec CHANGED
@@ -2,11 +2,11 @@
2
2
 
3
3
  Gem::Specification.new do |s|
4
4
  s.name = %q{imexport}
5
- s.version = "0.1.0"
5
+ s.version = "0.1.1"
6
6
 
7
7
  s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
8
8
  s.authors = ["alex"]
9
- s.date = %q{2009-04-21}
9
+ s.date = %q{2009-04-22}
10
10
  s.description = %q{Simple import from a text file generated by mysql -E ...}
11
11
  s.email = %q{alex@digns.com}
12
12
  s.extra_rdoc_files = ["CHANGELOG", "lib/imexport.rb", "README.rdoc"]
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: crhym3-imexport
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - alex
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-04-21 00:00:00 -07:00
12
+ date: 2009-04-22 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies: []
15
15