jsuchal-activerecord-fast-import 0.1.3 → 0.1.4
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +66 -66
- data/Rakefile +1 -0
- data/VERSION +1 -1
- data/activerecord-fast-import.gemspec +5 -7
- data/lib/activerecord-fast-import.rb +10 -3
- metadata +4 -5
- data/.document +0 -5
data/README.markdown
CHANGED
@@ -1,66 +1,66 @@
|
|
1
|
-
# activerecord-fast-import
|
2
|
-
|
3
|
-
Loads data from text files into tables using fast native MySQL [LOAD DATA INFILE](http://dev.mysql.com/doc/refman/5.1/en/load-data.html) query.
|
4
|
-
|
5
|
-
## Examples
|
6
|
-
|
7
|
-
### Loading data from tab delimited log file
|
8
|
-
|
9
|
-
Suppose you have an ActiveRecord model LogEntry defined as `LogEntry happened_at:datetime url:string` and a log file with tab delimited columns like this:
|
10
|
-
|
11
|
-
2009-09-30 12:32:43<tab>http://github.com/
|
12
|
-
2009-09-30 13:36:13<tab>http://facebook.com/
|
13
|
-
|
14
|
-
To import data from this log file, you have to use
|
15
|
-
|
16
|
-
LogEntry.fast_import('huge.log')
|
17
|
-
|
18
|
-
That's it!
|
19
|
-
|
20
|
-
Of course in real world you will also need more advanced features. Read on...
|
21
|
-
|
22
|
-
|
23
|
-
### Changing delimiters and ignoring some rows
|
24
|
-
|
25
|
-
Of course not all log files are delimited by tabs and newlines. Just pass custom delimiters to options. If you want to ignore first 10 lines, just use `:ignore_lines`
|
26
|
-
|
27
|
-
import_options = {
|
28
|
-
:fields_terminated_by => ',',
|
29
|
-
:lines_terminated_by => ';',
|
30
|
-
:ignore_lines => 10
|
31
|
-
}
|
32
|
-
|
33
|
-
### Changing order of columns and ignoring columns
|
34
|
-
|
35
|
-
Now, imagine you want to import data from a huge log file with following format:
|
36
|
-
|
37
|
-
http://github.com/<tab>Mozilla<tab>2009-09-30 12:32:43
|
38
|
-
http://facebook.com/<tab>Opera<tab>2009-09-30 13:36:13
|
39
|
-
|
40
|
-
It is clear that columns are in different order and we even want to ignore the second column. Let's do it
|
41
|
-
|
42
|
-
import_options = {:columns => ["url", "@dummy", "happened_at"]}
|
43
|
-
LogEntry.fast_import('huge.log', import_options)
|
44
|
-
|
45
|
-
The special `@dummy` loads that column into a local variable and when unused (in a transformation) is just ignored.
|
46
|
-
|
47
|
-
### Transforming data
|
48
|
-
|
49
|
-
Now imagine we have a log file like this:
|
50
|
-
|
51
|
-
2009-09-30 12:32:43<tab>http://github.com/<tab>image.jpg
|
52
|
-
2009-09-30 13:36:13<tab>http://facebook.com/<tab>styles/default.css
|
53
|
-
|
54
|
-
We want to concatenate those two columns into one.
|
55
|
-
|
56
|
-
import_options = {
|
57
|
-
:columns => ["happened_at", "@domain", "@file"],
|
58
|
-
:mapping => { :url => "CONCAT(@domain, @file)" }
|
59
|
-
}
|
60
|
-
LogEntry.fast_import('huge.log', import_options)
|
61
|
-
|
62
|
-
Of course you can use any of those shiny [MySQL functions](http://dev.mysql.com/doc/refman/5.1/en/functions.html).
|
63
|
-
|
64
|
-
## Copyright
|
65
|
-
|
66
|
-
Copyright (c) 2009 Jan Suchal. See LICENSE for details.
|
1
|
+
# activerecord-fast-import
|
2
|
+
|
3
|
+
Loads data from text files into tables using fast native MySQL [LOAD DATA INFILE](http://dev.mysql.com/doc/refman/5.1/en/load-data.html) query.
|
4
|
+
|
5
|
+
## Examples
|
6
|
+
|
7
|
+
### Loading data from tab delimited log file
|
8
|
+
|
9
|
+
Suppose you have an ActiveRecord model LogEntry defined as `LogEntry happened_at:datetime url:string` and a log file with tab delimited columns like this:
|
10
|
+
|
11
|
+
2009-09-30 12:32:43<tab>http://github.com/
|
12
|
+
2009-09-30 13:36:13<tab>http://facebook.com/
|
13
|
+
|
14
|
+
To import data from this log file, you have to use
|
15
|
+
|
16
|
+
LogEntry.fast_import('huge.log')
|
17
|
+
|
18
|
+
That's it!
|
19
|
+
|
20
|
+
Of course in real world you will also need more advanced features. Read on...
|
21
|
+
|
22
|
+
|
23
|
+
### Changing delimiters and ignoring some rows
|
24
|
+
|
25
|
+
Of course not all log files are delimited by tabs and newlines. Just pass custom delimiters to options. If you want to ignore first 10 lines, just use `:ignore_lines`
|
26
|
+
|
27
|
+
import_options = {
|
28
|
+
:fields_terminated_by => ',',
|
29
|
+
:lines_terminated_by => ';',
|
30
|
+
:ignore_lines => 10
|
31
|
+
}
|
32
|
+
|
33
|
+
### Changing order of columns and ignoring columns
|
34
|
+
|
35
|
+
Now, imagine you want to import data from a huge log file with following format:
|
36
|
+
|
37
|
+
http://github.com/<tab>Mozilla<tab>2009-09-30 12:32:43
|
38
|
+
http://facebook.com/<tab>Opera<tab>2009-09-30 13:36:13
|
39
|
+
|
40
|
+
It is clear that columns are in different order and we even want to ignore the second column. Let's do it
|
41
|
+
|
42
|
+
import_options = {:columns => ["url", "@dummy", "happened_at"]}
|
43
|
+
LogEntry.fast_import('huge.log', import_options)
|
44
|
+
|
45
|
+
The special `@dummy` loads that column into a local variable and when unused (in a transformation) is just ignored.
|
46
|
+
|
47
|
+
### Transforming data
|
48
|
+
|
49
|
+
Now imagine we have a log file like this:
|
50
|
+
|
51
|
+
2009-09-30 12:32:43<tab>http://github.com/<tab>image.jpg
|
52
|
+
2009-09-30 13:36:13<tab>http://facebook.com/<tab>styles/default.css
|
53
|
+
|
54
|
+
We want to concatenate those two columns into one.
|
55
|
+
|
56
|
+
import_options = {
|
57
|
+
:columns => ["happened_at", "@domain", "@file"],
|
58
|
+
:mapping => { :url => "CONCAT(@domain, @file)" }
|
59
|
+
}
|
60
|
+
LogEntry.fast_import('huge.log', import_options)
|
61
|
+
|
62
|
+
Of course you can use any of those shiny [MySQL functions](http://dev.mysql.com/doc/refman/5.1/en/functions.html).
|
63
|
+
|
64
|
+
## Copyright
|
65
|
+
|
66
|
+
Copyright (c) 2009 Jan Suchal. See LICENSE for details.
|
data/Rakefile
CHANGED
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.1.
|
1
|
+
0.1.4
|
@@ -5,11 +5,11 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{activerecord-fast-import}
|
8
|
-
s.version = "0.1.
|
8
|
+
s.version = "0.1.4"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Jan Suchal"]
|
12
|
-
s.date = %q{2009-09-
|
12
|
+
s.date = %q{2009-09-10}
|
13
13
|
s.description = %q{Native MySQL additions to ActiveRecord, like LOAD DATA INFILE, ENABLE/DISABLE KEYS, TRUNCATE TABLE.}
|
14
14
|
s.email = %q{johno@jsmf.net}
|
15
15
|
s.extra_rdoc_files = [
|
@@ -17,8 +17,7 @@ Gem::Specification.new do |s|
|
|
17
17
|
"README.markdown"
|
18
18
|
]
|
19
19
|
s.files = [
|
20
|
-
".
|
21
|
-
".gitignore",
|
20
|
+
".gitignore",
|
22
21
|
"LICENSE",
|
23
22
|
"README.markdown",
|
24
23
|
"Rakefile",
|
@@ -30,11 +29,10 @@ Gem::Specification.new do |s|
|
|
30
29
|
"spec/activerecord-fast-import_spec.rb",
|
31
30
|
"spec/spec_helper.rb"
|
32
31
|
]
|
33
|
-
s.has_rdoc = true
|
34
32
|
s.homepage = %q{http://github.com/jsuchal/activerecord-fast-import}
|
35
33
|
s.rdoc_options = ["--charset=UTF-8"]
|
36
34
|
s.require_paths = ["lib"]
|
37
|
-
s.rubygems_version = %q{1.3.
|
35
|
+
s.rubygems_version = %q{1.3.5}
|
38
36
|
s.summary = %q{Fast MySQL import for ActiveRecord}
|
39
37
|
s.test_files = [
|
40
38
|
"spec/activerecord-fast-import_spec.rb",
|
@@ -43,7 +41,7 @@ Gem::Specification.new do |s|
|
|
43
41
|
|
44
42
|
if s.respond_to? :specification_version then
|
45
43
|
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
46
|
-
s.specification_version =
|
44
|
+
s.specification_version = 3
|
47
45
|
|
48
46
|
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
49
47
|
s.add_development_dependency(%q<rspec>, [">= 0"])
|
@@ -16,6 +16,13 @@ module ActiveRecord #:nodoc:
|
|
16
16
|
connection.execute("ALTER TABLE #{quoted_table_name} ENABLE KEYS")
|
17
17
|
end
|
18
18
|
|
19
|
+
# Disables keys, yields block, enables keys.
|
20
|
+
def self.with_keys_disabled
|
21
|
+
disable_keys
|
22
|
+
yield
|
23
|
+
enable_keys
|
24
|
+
end
|
25
|
+
|
19
26
|
# Loads data from file using MySQL native LOAD DATA INFILE query, disabling
|
20
27
|
# key updates for even faster import speed
|
21
28
|
#
|
@@ -24,9 +31,9 @@ module ActiveRecord #:nodoc:
|
|
24
31
|
# * +options+ (see <tt>load_data_infile</tt>)
|
25
32
|
def self.fast_import(files, options = {})
|
26
33
|
files = [files] unless files.is_a? Array
|
27
|
-
|
28
|
-
|
29
|
-
|
34
|
+
with_keys_disabled do
|
35
|
+
files.each {|file| load_data_infile(file, options)}
|
36
|
+
end
|
30
37
|
end
|
31
38
|
|
32
39
|
# Loads data from file using MySQL native LOAD DATA INFILE query
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: jsuchal-activerecord-fast-import
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Suchal
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-09-
|
12
|
+
date: 2009-09-10 00:00:00 -07:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -42,7 +42,6 @@ extra_rdoc_files:
|
|
42
42
|
- LICENSE
|
43
43
|
- README.markdown
|
44
44
|
files:
|
45
|
-
- .document
|
46
45
|
- .gitignore
|
47
46
|
- LICENSE
|
48
47
|
- README.markdown
|
@@ -54,7 +53,7 @@ files:
|
|
54
53
|
- nbproject/project.xml
|
55
54
|
- spec/activerecord-fast-import_spec.rb
|
56
55
|
- spec/spec_helper.rb
|
57
|
-
has_rdoc:
|
56
|
+
has_rdoc: false
|
58
57
|
homepage: http://github.com/jsuchal/activerecord-fast-import
|
59
58
|
post_install_message:
|
60
59
|
rdoc_options:
|
@@ -78,7 +77,7 @@ requirements: []
|
|
78
77
|
rubyforge_project:
|
79
78
|
rubygems_version: 1.2.0
|
80
79
|
signing_key:
|
81
|
-
specification_version:
|
80
|
+
specification_version: 3
|
82
81
|
summary: Fast MySQL import for ActiveRecord
|
83
82
|
test_files:
|
84
83
|
- spec/activerecord-fast-import_spec.rb
|