crhym3-imexport 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. data/README.rdoc +216 -1
  2. data/Rakefile +1 -1
  3. data/imexport.gemspec +2 -2
  4. metadata +2 -2
data/README.rdoc CHANGED
@@ -1,4 +1,219 @@
1
1
  == Description
2
2
 
3
- TODO
3
+ This library imports data from a text file (created by
4
+ mysql -E -e "SELECT something FROM somewhere, somewhere_else ..." > data.dump
5
+ only for the moment) into your rails app, meaning it only works with your
6
+ ActiveRecord models.
7
+
8
+ I had this little problem. There was an old web app written in java and a new
9
+ rails app. Although both were using mysql, "old" and "new" DBs were running on
10
+ separated servers and they had quite a different data structures. Still,
11
+ I wanted to keep some pieces of data synchronized quite frequently, at least
12
+ for a not-so-short transition period.
13
+
14
+ I had a couple options to consider:
15
+
16
+ * Simple "INSERT INTO new_DB (...) SELECT original_data FROM old_DB" or similar
17
+ to that. Cons: I couldn't do it in one sql expression as the data structures
18
+ were too different; I'd have to take care of attributes such as +updated_at+
19
+ and +created_at+ manually, let alone models validations.
20
+
21
+ * Create models for old data structures in the new rails app and do something
22
+ like this in a rake task:
23
+
24
+ OldModel.all do |old|
25
+ NewModel.create :attr1 => old.attr1, :attr2 => old...
26
+ end
27
+
28
+ Cons: I didn't want to mess up the new rails app with a bunch of models
29
+ I'd never use except for synchronization.
30
+
31
+ * Dump +original_data+ into a CSV format and the use +CSV+ or +FasterCSV+.
32
+ Actually this was the choice I opted for from the beginning. The problem
33
+ here was that I had a really messed up data sometimes: lots of copy&paste
34
+ from software like MS Word, etc. +FasterCSV+ was throwing Malformed exceptions
35
+ too often and +CSV+ sometimes wasn't able to recognize end of row / beginning
36
+ of a new row. It wasn't their fault, it was my data bad quality. So, I decided
37
+ to write this little gem.
38
+
39
+ == How it's different from CSV and FasterCSV
40
+
41
+ First off, this library isn't meant to replace either of them. It works with
42
+ different text formats (not CSV) and doesn't do just file parsing.
43
+
44
+ Consider this snippet created by
45
+ mysql -E -e "SELECT title AS COLUMN_title, speaker AS COLUMN_speaker, abstract AS COLUMN_abstract FROM Seminars" > seminars.txt:
46
+
47
+ *************************** 7. row ***************************
48
+ COLUMN_title: Conditional XPath = Codd Complete XPath
49
+ COLUMN_speaker: John Smith
50
+ COLUMN_abstract: This paper positively solves the following problem: Is there a natural
51
+ expansion of XPath 1.0 in which every first order query over
52
+ XML document tree models is expressible?
53
+ We give two necessary and sufficient conditions on XPath like
54
+
55
+ This library creates a new model object, recognizes each +COLUMN_attr+ and
56
+ tries to set attribute of that object, like model.title = COLUMN_title,
57
+ model.speaker = COLUMN_speaker and model.abstract = COLUMN_abstract.
58
+
59
+ It then runs model validations (model.valid?) and does either model.save or
60
+ model.update_attributes(attrs_hash).
61
+
62
+ == Usage
63
+
64
+ Say, you have a model called +Seminar+ with the following attributes:
65
+
66
+ create_table "seminars", :force => true do |t|
67
+ t.string "title",
68
+ t.text "abstract",
69
+ t.datetime "date",
70
+ t.text "notes"
71
+ t.datetime "created_at"
72
+ t.datetime "updated_at"
73
+ t.boolean "published"
74
+ end
75
+
76
+ Consider a snippet of a DB text dump similar to the previous example.
77
+ Let's just add few more columns:
78
+
79
+ *************************** 7. row ***************************
80
+ COLUMN_title: Conditional XPath = Codd Complete XPath
81
+ COLUMN_date_time: 2004-11-30T15:30:00
82
+ COLUMN_publish: 1
83
+ COLUMN_abstract: This paper positively solves the following problem: Is there a natural
84
+ expansion of XPath 1.0 in which every first order query over
85
+ XML document tree models is expressible?
86
+
87
+ Define a rake task in your rails app and require +imexport+, e.g.
88
+
89
+ namespace :db do
90
+ namespace :import do
91
+ task :seminars => :environment do
92
+ require imexport
93
+ end
94
+ end
95
+ end
96
+
97
+ Now, define columns-to-model-attributes:
98
+
99
+ COLUMNS_TO_MODEL_MAP = {
100
+ 'date_time' => { :date => Proc.new do |datetime|
101
+ # YYYY-MM-DDTHH:MM:SS
102
+ DateTime.strptime(datetime, '%FT%T')
103
+ end },
104
+ 'publish' => { :published => Proc.new do |val|
105
+ val.to_i == 1
106
+ end }
107
+ }
108
+
109
+ As you noticed we didn't define mapping for +title+ and +abstract+ as they
110
+ are simple strings and don't need any special conversion. Plus, column names
111
+ are the same as model attributes.
112
+
113
+ Lastly, let's do the sync:
114
+
115
+ ImExport::import(ENV['FROM_FILE'], {
116
+ :class_name => 'Seminar',
117
+ :find_by => 'title',
118
+ :db_columns_prefix => 'COLUMN_',
119
+ :map => COLUMNS_TO_MODEL_MAP})
120
+
121
+ You would run the task in this way:
122
+
123
+ rake db:import:seminars FROM_FILE=/path/to/seminars.txt
124
+
125
+ and your +seminars+ table is synchronized.
126
+
127
+ So, the complete rake task would look like this:
128
+
129
+ namespace :db do
130
+ namespace :import do
131
+ task :seminars => :environment do
132
+ require imexport
133
+
134
+ COLUMNS_TO_MODEL_MAP = {
135
+ 'date_time' => { :date => Proc.new do |datetime|
136
+ # YYYY-MM-DDTHH:MM:SS
137
+ DateTime.strptime(datetime, '%FT%T')
138
+ end },
139
+ 'publish' => { :published => Proc.new do |val|
140
+ val.to_i == 1
141
+ end }
142
+ }
143
+
144
+ ImExport::import(ENV['FROM_FILE'], {
145
+ :class_name => 'Seminar',
146
+ :find_by => 'title',
147
+ :db_columns_prefix => 'COLUMN_',
148
+ :map => COLUMNS_TO_MODEL_MAP})
149
+ end
150
+ end
151
+ end
152
+
153
+ Also, you can pass a block to ImExport::import. In that case you'll have to
154
+ call model.save or model.update_attributes(...) yourself:
155
+
156
+ ImExport::import(ENV['FROM_FILE'], {
157
+ :class_name => 'Seminar',
158
+ :find_by => 'title',
159
+ :db_columns_prefix => 'COLUMN_',
160
+ :map => COLUMNS_TO_MODEL_MAP}) do |seminar|
161
+
162
+ # do something with seminar object here, e.g.
163
+ # seminar.save
164
+ puts "---> #{seminar.inspect}"
165
+ end
166
+
167
+ === Options for ImExport::import
168
+
169
+
170
+ +class_name+::
171
+ "String" or :symbol. ActiveRecord model defined in your rails app.
172
+
173
+ +find_by+::
174
+ "String" or :symbol.
175
+ This is how ImExport will recognize whether it should do model.save or
176
+ model.update_attributes(...). Considering previous example it would do
177
+ seminar.save if seminar.find_by_title(...) returns nil or
178
+ seminar.update_attributes(...) otherwise.
179
+
180
+ +db_columns_prefix+::
181
+ "String".
182
+ Column name prefix that should be skipped while looking for the corresponding
183
+ model attribute name. Againg, considering previous example, +COLUMN_title+
184
+ actually means +title+ attribute of +Seminar+ model.
185
+
186
+ +map+::
187
+ Hash.
188
+ Tells ImExport how to map column attributes with their corresponding model
189
+ attributes. Don't add +db_columns_prefix+ to the colum names here,
190
+ it is already cleaned up.
191
+
192
+ Also, you don't really have to define mapping for attributes that have the
193
+ same names as columns in the text file to be parsed, they will be recognized
194
+ and set automatically.
195
+
196
+ Every item in this Hash can be defined in one of the following ways:
197
+
198
+ 'column_name' => :symbol
199
+ _Behavior_: model.symbol = value_of_column_name
200
+
201
+ 'column_name' => { :symbol => Proc.new { |column_value| ... } }
202
+ _Behavior_: model.symbol = result_of_Proc_call where Proc's only argument is
203
+ the column value.
204
+
205
+ 'column_name' => Proc.new { |column_value, model_object| ... }
206
+ _Behavior_: Proc called with two arguments, column value and object-to-be-saved itself.
207
+ This is the only case where your code should take of updating
208
+ model's attribute(s) since ImExport can't guess the attribute name.
209
+
210
+ == How to install
211
+
212
+ sudo gem install crhym3-imexport
213
+
214
+ == License
215
+
216
+ Copyright (c) 2009 Alex Vagin, released under the MIT license.
217
+
218
+ mailto:alex@digns.com
4
219
 
data/Rakefile CHANGED
@@ -2,7 +2,7 @@ require 'rubygems'
2
2
  require 'rake'
3
3
  require 'echoe'
4
4
 
5
- Echoe.new('imexport', '0.1.0') do |p|
5
+ Echoe.new('imexport', '0.1.1') do |p|
6
6
  p.description = "Simple import from a text file generated by mysql -E ..."
7
7
  p.url = "http://github.com/crhym3/imexport"
8
8
  p.author = "alex"
data/imexport.gemspec CHANGED
@@ -2,11 +2,11 @@
2
2
 
3
3
  Gem::Specification.new do |s|
4
4
  s.name = %q{imexport}
5
- s.version = "0.1.0"
5
+ s.version = "0.1.1"
6
6
 
7
7
  s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
8
8
  s.authors = ["alex"]
9
- s.date = %q{2009-04-21}
9
+ s.date = %q{2009-04-22}
10
10
  s.description = %q{Simple import from a text file generated by mysql -E ...}
11
11
  s.email = %q{alex@digns.com}
12
12
  s.extra_rdoc_files = ["CHANGELOG", "lib/imexport.rb", "README.rdoc"]
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: crhym3-imexport
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - alex
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-04-21 00:00:00 -07:00
12
+ date: 2009-04-22 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies: []
15
15