crhym3-imexport 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +216 -1
- data/Rakefile +1 -1
- data/imexport.gemspec +2 -2
- metadata +2 -2
data/README.rdoc
CHANGED
@@ -1,4 +1,219 @@
|
|
1
1
|
== Description
|
2
2
|
|
3
|
-
|
3
|
+
This library imports data from a text file (created by
|
4
|
+
mysql -E -e "SELECT something FROM somewhere, somewhere_else ..." > data.dump
|
5
|
+
only for the moment) into your rails app, meaning it only works with your
|
6
|
+
ActiveRecord models.
|
7
|
+
|
8
|
+
I had this little problem. There was an old web app written in java and a new
|
9
|
+
rails app. Although both were using mysql, "old" and "new" DBs were running on
|
10
|
+
separated servers and they had quite a different data structures. Still,
|
11
|
+
I wanted to keep some pieces of data synchronized quite frequently, at least
|
12
|
+
for a not-so-short transition period.
|
13
|
+
|
14
|
+
I had a couple options to consider:
|
15
|
+
|
16
|
+
* Simple "INSERT INTO new_DB (...) SELECT original_data FROM old_DB" or similar
|
17
|
+
to that. Cons: I couldn't do it in one sql expression as the data structures
|
18
|
+
were too different; I'd have to take care of attributes such as +updated_at+
|
19
|
+
and +created_at+ manually, let alone models validations.
|
20
|
+
|
21
|
+
* Create models for old data structures in the new rails app and do something
|
22
|
+
like this in a rake task:
|
23
|
+
|
24
|
+
OldModel.all do |old|
|
25
|
+
NewModel.create :attr1 => old.attr1, :attr2 => old...
|
26
|
+
end
|
27
|
+
|
28
|
+
Cons: I didn't want to mess up the new rails app with a bunch of models
|
29
|
+
I'd never use except for synchronization.
|
30
|
+
|
31
|
+
* Dump +original_data+ into a CSV format and the use +CSV+ or +FasterCSV+.
|
32
|
+
Actually this was the choice I opted for from the beginning. The problem
|
33
|
+
here was that I had a really messed up data sometimes: lots of copy&paste
|
34
|
+
from software like MS Word, etc. +FasterCSV+ was throwing Malformed exceptions
|
35
|
+
too often and +CSV+ sometimes wasn't able to recognize end of row / beginning
|
36
|
+
of a new row. It wasn't their fault, it was my data bad quality. So, I decided
|
37
|
+
to write this little gem.
|
38
|
+
|
39
|
+
== How it's different from CSV and FasterCSV
|
40
|
+
|
41
|
+
First off, this library isn't meant to replace either of them. It works with
|
42
|
+
different text formats (not CSV) and doesn't do just file parsing.
|
43
|
+
|
44
|
+
Consider this snippet created by
|
45
|
+
mysql -E -e "SELECT title AS COLUMN_title, speaker AS COLUMN_speaker, abstract AS COLUMN_abstract FROM Seminars" > seminars.txt:
|
46
|
+
|
47
|
+
*************************** 7. row ***************************
|
48
|
+
COLUMN_title: Conditional XPath = Codd Complete XPath
|
49
|
+
COLUMN_speaker: John Smith
|
50
|
+
COLUMN_abstract: This paper positively solves the following problem: Is there a natural
|
51
|
+
expansion of XPath 1.0 in which every first order query over
|
52
|
+
XML document tree models is expressible?
|
53
|
+
We give two necessary and sufficient conditions on XPath like
|
54
|
+
|
55
|
+
This library creates a new model object, recognizes each +COLUMN_attr+ and
|
56
|
+
tries to set attribute of that object, like model.title = COLUMN_title,
|
57
|
+
model.speaker = COLUMN_speaker and model.abstract = COLUMN_abstract.
|
58
|
+
|
59
|
+
It then runs model validations (model.valid?) and does either model.save or
|
60
|
+
model.update_attributes(attrs_hash).
|
61
|
+
|
62
|
+
== Usage
|
63
|
+
|
64
|
+
Say, you have a model called +Seminar+ with the following attributes:
|
65
|
+
|
66
|
+
create_table "seminars", :force => true do |t|
|
67
|
+
t.string "title",
|
68
|
+
t.text "abstract",
|
69
|
+
t.datetime "date",
|
70
|
+
t.text "notes"
|
71
|
+
t.datetime "created_at"
|
72
|
+
t.datetime "updated_at"
|
73
|
+
t.boolean "published"
|
74
|
+
end
|
75
|
+
|
76
|
+
Consider a snippet of a DB text dump similar to the previous example.
|
77
|
+
Let's just add few more columns:
|
78
|
+
|
79
|
+
*************************** 7. row ***************************
|
80
|
+
COLUMN_title: Conditional XPath = Codd Complete XPath
|
81
|
+
COLUMN_date_time: 2004-11-30T15:30:00
|
82
|
+
COLUMN_publish: 1
|
83
|
+
COLUMN_abstract: This paper positively solves the following problem: Is there a natural
|
84
|
+
expansion of XPath 1.0 in which every first order query over
|
85
|
+
XML document tree models is expressible?
|
86
|
+
|
87
|
+
Define a rake task in your rails app and require +imexport+, e.g.
|
88
|
+
|
89
|
+
namespace :db do
|
90
|
+
namespace :import do
|
91
|
+
task :seminars => :environment do
|
92
|
+
require imexport
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
Now, define columns-to-model-attributes:
|
98
|
+
|
99
|
+
COLUMNS_TO_MODEL_MAP = {
|
100
|
+
'date_time' => { :date => Proc.new do |datetime|
|
101
|
+
# YYYY-MM-DDTHH:MM:SS
|
102
|
+
DateTime.strptime(datetime, '%FT%T')
|
103
|
+
end },
|
104
|
+
'publish' => { :published => Proc.new do |val|
|
105
|
+
val.to_i == 1
|
106
|
+
end }
|
107
|
+
}
|
108
|
+
|
109
|
+
As you noticed we didn't define mapping for +title+ and +abstract+ as they
|
110
|
+
are simple strings and don't need any special conversion. Plus, column names
|
111
|
+
are the same as model attributes.
|
112
|
+
|
113
|
+
Lastly, let's do the sync:
|
114
|
+
|
115
|
+
ImExport::import(ENV['FROM_FILE'], {
|
116
|
+
:class_name => 'Seminar',
|
117
|
+
:find_by => 'title',
|
118
|
+
:db_columns_prefix => 'COLUMN_',
|
119
|
+
:map => COLUMNS_TO_MODEL_MAP})
|
120
|
+
|
121
|
+
You would run the task in this way:
|
122
|
+
|
123
|
+
rake db:import:seminars FROM_FILE=/path/to/seminars.txt
|
124
|
+
|
125
|
+
and your +seminars+ table is synchronized.
|
126
|
+
|
127
|
+
So, the complete rake task would look like this:
|
128
|
+
|
129
|
+
namespace :db do
|
130
|
+
namespace :import do
|
131
|
+
task :seminars => :environment do
|
132
|
+
require imexport
|
133
|
+
|
134
|
+
COLUMNS_TO_MODEL_MAP = {
|
135
|
+
'date_time' => { :date => Proc.new do |datetime|
|
136
|
+
# YYYY-MM-DDTHH:MM:SS
|
137
|
+
DateTime.strptime(datetime, '%FT%T')
|
138
|
+
end },
|
139
|
+
'publish' => { :published => Proc.new do |val|
|
140
|
+
val.to_i == 1
|
141
|
+
end }
|
142
|
+
}
|
143
|
+
|
144
|
+
ImExport::import(ENV['FROM_FILE'], {
|
145
|
+
:class_name => 'Seminar',
|
146
|
+
:find_by => 'title',
|
147
|
+
:db_columns_prefix => 'COLUMN_',
|
148
|
+
:map => COLUMNS_TO_MODEL_MAP})
|
149
|
+
end
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
Also, you can pass a block to ImExport::import. In that case you'll have to
|
154
|
+
call model.save or model.update_attributes(...) yourself:
|
155
|
+
|
156
|
+
ImExport::import(ENV['FROM_FILE'], {
|
157
|
+
:class_name => 'Seminar',
|
158
|
+
:find_by => 'title',
|
159
|
+
:db_columns_prefix => 'COLUMN_',
|
160
|
+
:map => COLUMNS_TO_MODEL_MAP}) do |seminar|
|
161
|
+
|
162
|
+
# do something with seminar object here, e.g.
|
163
|
+
# seminar.save
|
164
|
+
puts "---> #{seminar.inspect}"
|
165
|
+
end
|
166
|
+
|
167
|
+
=== Options for ImExport::import
|
168
|
+
|
169
|
+
|
170
|
+
+class_name+::
|
171
|
+
"String" or :symbol. ActiveRecord model defined in your rails app.
|
172
|
+
|
173
|
+
+find_by+::
|
174
|
+
"String" or :symbol.
|
175
|
+
This is how ImExport will recognize whether it should do model.save or
|
176
|
+
model.update_attributes(...). Considering previous example it would do
|
177
|
+
seminar.save if seminar.find_by_title(...) returns nil or
|
178
|
+
seminar.update_attributes(...) otherwise.
|
179
|
+
|
180
|
+
+db_columns_prefix+::
|
181
|
+
"String".
|
182
|
+
Column name prefix that should be skipped while looking for the corresponding
|
183
|
+
model attribute name. Againg, considering previous example, +COLUMN_title+
|
184
|
+
actually means +title+ attribute of +Seminar+ model.
|
185
|
+
|
186
|
+
+map+::
|
187
|
+
Hash.
|
188
|
+
Tells ImExport how to map column attributes with their corresponding model
|
189
|
+
attributes. Don't add +db_columns_prefix+ to the colum names here,
|
190
|
+
it is already cleaned up.
|
191
|
+
|
192
|
+
Also, you don't really have to define mapping for attributes that have the
|
193
|
+
same names as columns in the text file to be parsed, they will be recognized
|
194
|
+
and set automatically.
|
195
|
+
|
196
|
+
Every item in this Hash can be defined in one of the following ways:
|
197
|
+
|
198
|
+
'column_name' => :symbol
|
199
|
+
_Behavior_: model.symbol = value_of_column_name
|
200
|
+
|
201
|
+
'column_name' => { :symbol => Proc.new { |column_value| ... } }
|
202
|
+
_Behavior_: model.symbol = result_of_Proc_call where Proc's only argument is
|
203
|
+
the column value.
|
204
|
+
|
205
|
+
'column_name' => Proc.new { |column_value, model_object| ... }
|
206
|
+
_Behavior_: Proc called with two arguments, column value and object-to-be-saved itself.
|
207
|
+
This is the only case where your code should take of updating
|
208
|
+
model's attribute(s) since ImExport can't guess the attribute name.
|
209
|
+
|
210
|
+
== How to install
|
211
|
+
|
212
|
+
sudo gem install crhym3-imexport
|
213
|
+
|
214
|
+
== License
|
215
|
+
|
216
|
+
Copyright (c) 2009 Alex Vagin, released under the MIT license.
|
217
|
+
|
218
|
+
mailto:alex@digns.com
|
4
219
|
|
data/Rakefile
CHANGED
@@ -2,7 +2,7 @@ require 'rubygems'
|
|
2
2
|
require 'rake'
|
3
3
|
require 'echoe'
|
4
4
|
|
5
|
-
Echoe.new('imexport', '0.1.
|
5
|
+
Echoe.new('imexport', '0.1.1') do |p|
|
6
6
|
p.description = "Simple import from a text file generated by mysql -E ..."
|
7
7
|
p.url = "http://github.com/crhym3/imexport"
|
8
8
|
p.author = "alex"
|
data/imexport.gemspec
CHANGED
@@ -2,11 +2,11 @@
|
|
2
2
|
|
3
3
|
Gem::Specification.new do |s|
|
4
4
|
s.name = %q{imexport}
|
5
|
-
s.version = "0.1.
|
5
|
+
s.version = "0.1.1"
|
6
6
|
|
7
7
|
s.required_rubygems_version = Gem::Requirement.new(">= 1.2") if s.respond_to? :required_rubygems_version=
|
8
8
|
s.authors = ["alex"]
|
9
|
-
s.date = %q{2009-04-
|
9
|
+
s.date = %q{2009-04-22}
|
10
10
|
s.description = %q{Simple import from a text file generated by mysql -E ...}
|
11
11
|
s.email = %q{alex@digns.com}
|
12
12
|
s.extra_rdoc_files = ["CHANGELOG", "lib/imexport.rb", "README.rdoc"]
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: crhym3-imexport
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- alex
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-04-
|
12
|
+
date: 2009-04-22 00:00:00 -07:00
|
13
13
|
default_executable:
|
14
14
|
dependencies: []
|
15
15
|
|