ferry 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +22 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +22 -0
- data/README.md +155 -0
- data/Rakefile +2 -0
- data/ferry.gemspec +26 -0
- data/lib/ferry.rb +7 -0
- data/lib/ferry/engine.rb +16 -0
- data/lib/ferry/logger.rb +12 -0
- data/lib/ferry/version.rb +3 -0
- metadata +129 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 135d110be1a1feb5802cb1035af0fab9bf98bea1
|
4
|
+
data.tar.gz: b53e81709fc2a3bbef1314732f8feae82ffe2d41
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 8274f3a35429634000afd5e6cd106ee24e92a6db2ff7a4e40db9ebd80768ca9f743e1c3534757b23cdd8a161ad1177a7564f118961a6eec083831e61b03629f3
|
7
|
+
data.tar.gz: d9f0d4414f852a925b20e0ee9697a5bec2b203b8974ac6018e440f1f195b28014d474e51fa6c2e11d449c40ff06622f5a9eb0b1d534595f5c74a8004b7c68d25
|
data/.gitignore
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
*.gem
|
2
|
+
*.rbc
|
3
|
+
.bundle
|
4
|
+
.config
|
5
|
+
.yardoc
|
6
|
+
Gemfile.lock
|
7
|
+
InstalledFiles
|
8
|
+
_yardoc
|
9
|
+
coverage
|
10
|
+
doc/
|
11
|
+
lib/bundler/man
|
12
|
+
pkg
|
13
|
+
rdoc
|
14
|
+
spec/reports
|
15
|
+
test/tmp
|
16
|
+
test/version_tmp
|
17
|
+
tmp
|
18
|
+
*.bundle
|
19
|
+
*.so
|
20
|
+
*.o
|
21
|
+
*.a
|
22
|
+
mkmf.log
|
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2014 Anthony Corletti
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,155 @@
|
|
1
|
+
# Ferry
|
2
|
+
|
3
|
+
Ferry is a data migration and data manipulation tool that seeks to quickly and easily reduce overhead when dealing with big data problems.
|
4
|
+
|
5
|
+
## TO-DO
|
6
|
+
|
7
|
+
- [ ] Refactoring before public release
|
8
|
+
- [x] Define action-items for refactor
|
9
|
+
- [x] Provide working example(s) of using ferry
|
10
|
+
- [ ] Public release fine-tuning
|
11
|
+
- [ ] Tests
|
12
|
+
- [ ] Testing input for migrate method (max_workers, batch_size)
|
13
|
+
- [ ] Testing that there is an ActiveRecord::Relation object being passed to find_in_batches
|
14
|
+
- [ ] Migration Scenarios - dummy class migration
|
15
|
+
- [ ] Refactor logging logic into Logger class
|
16
|
+
- [x] Initial revision
|
17
|
+
- [ ] Review
|
18
|
+
|
19
|
+
## Installation
|
20
|
+
|
21
|
+
Add this line to your application's Gemfile:
|
22
|
+
|
23
|
+
gem 'ferry'
|
24
|
+
|
25
|
+
And then execute:
|
26
|
+
|
27
|
+
$ bundle
|
28
|
+
|
29
|
+
Or install it yourself as:
|
30
|
+
|
31
|
+
$ gem install ferry
|
32
|
+
|
33
|
+
## Usage
|
34
|
+
|
35
|
+
Usage pending. See examples / submit PR's for your ideas.
|
36
|
+
|
37
|
+
## Example(s)
|
38
|
+
|
39
|
+
###### 29 July 2014
|
40
|
+
Version 0.0.1 is functional with the rake task defined here :: https://github.com/customink/design_content_migration/blob/master/lib/tasks/ferry_example.rake#L10
|
41
|
+
|
42
|
+
Please manually install ferry from your locally cloned repo ...
|
43
|
+
```
|
44
|
+
git clone git@github.com:customink/ferry.git
|
45
|
+
cd ferry
|
46
|
+
gem build ferry.gemspec
|
47
|
+
gem install ferry
|
48
|
+
```
|
49
|
+
add it to your app's Gemfile
|
50
|
+
```
|
51
|
+
gem 'ferry'
|
52
|
+
```
|
53
|
+
and then
|
54
|
+
```
|
55
|
+
bundle install
|
56
|
+
```
|
57
|
+
as it has not been pushed to rubygems.com yet.
|
58
|
+
|
59
|
+
Tests - Coming soon to an editor near me!
|
60
|
+
|
61
|
+
###### 28 July 2014
|
62
|
+
Ferry should not support Oracle.
|
63
|
+
|
64
|
+
###### 25 July 2014
|
65
|
+
After a few more reviews with @metaskills, @gilr00y, @jdlehman, and @danielwheeler1987, Ferry will extend ActiveRecord with a "migrate" (more legit name search still in naming progress) method. From there we are going to pass the same relation to find in batches to a worker which will plow through the batch passed to it via a yield call from the task.
|
66
|
+
|
67
|
+
Tests will include; validate the data passed into the worker (log) and testing that there is an ActiveRecord::Relation being passed to find_in_batches.
|
68
|
+
|
69
|
+
###### 23 July 2014
|
70
|
+
After a few chats with @gilr00y and @jdlehman Ferry may extend ActiveRecord with a "migrate" method we could call on an ActiveRecord object. From there that object would call an Engine instance with appropriate fields to kickoff the actual data migration.
|
71
|
+
|
72
|
+
There is some logic duplication and layer duplication between the Engine class and the "migrate" method that extends ActiveRecord. Still working out how to concisely write logic that handles the management of forking connection and engine init calls.
|
73
|
+
|
74
|
+
```
|
75
|
+
require "ferry/version"
|
76
|
+
require 'models/engine'
|
77
|
+
require 'models/logger'
|
78
|
+
|
79
|
+
module Ferry
|
80
|
+
class ActiveRecord
|
81
|
+
def self.migrate(&block)
|
82
|
+
yield
|
83
|
+
end
|
84
|
+
end
|
85
|
+
end
|
86
|
+
```
|
87
|
+
|
88
|
+
This implementation should be able to run something like this ...
|
89
|
+
|
90
|
+
```
|
91
|
+
engine = Engine.new(
|
92
|
+
Design.where("savedate > ?", 6.months.ago.strftime("%d.%m.%Y %H").to_datetime).id,
|
93
|
+
Design.where("savedate > ?", 3.months.ago.strftime("%d.%m.%Y %H").to_datetime).id,
|
94
|
+
100_000,
|
95
|
+
1_000,
|
96
|
+
"log/ferry"
|
97
|
+
)
|
98
|
+
|
99
|
+
Design.where("savedate > ?", 130.hours.ago.strftime("%d.%m.%Y %H").to_datetime).migrate(
|
100
|
+
engine.run do | start_id, end_id, chunk_size, batch_size, log |
|
101
|
+
worker.run do | start_id, chunk_size, batch_size, log |
|
102
|
+
worker_end_id = start_id + chunk_size - 1
|
103
|
+
Design.where("id >= ? && id <= ?", start_id, worker_end_id).find_in_batches(batch_size: batch_size) do |batch|
|
104
|
+
# move and manipulate data as you please
|
105
|
+
end
|
106
|
+
start_id += batch_size
|
107
|
+
end
|
108
|
+
end
|
109
|
+
)
|
110
|
+
|
111
|
+
```
|
112
|
+
|
113
|
+
###### 22 July 2014
|
114
|
+
After installing ferry to your local machine or bundling from your gemfile - in your migration task make sure to define your chunker as such ...
|
115
|
+
|
116
|
+
```
|
117
|
+
require 'ferry'
|
118
|
+
|
119
|
+
namespace :example do
|
120
|
+
task "my_migration_task" do
|
121
|
+
|
122
|
+
ferry = Engine.new(
|
123
|
+
:max_workers => number_of_workers ex:8,
|
124
|
+
:start_id => where_are_we_starting ex:2910, Model.first.id,
|
125
|
+
:end_id => where_are_we_ending ex:8190, Model.last.id,
|
126
|
+
:chunk_size => size_of_chunks_that_workers_will_process ex:42,
|
127
|
+
:working_dir => ex:"path/to/working_dir"
|
128
|
+
)
|
129
|
+
|
130
|
+
ferry.run do |start_id, chunk_size, log|
|
131
|
+
begin
|
132
|
+
work = Model.select(":id").where("? <= id and id < ?", start_id, start_id + chunk_size)
|
133
|
+
rows_to_process = rel.count
|
134
|
+
log.puts("rows_to_process: #{rows_to_process}")
|
135
|
+
work.find_in_batches(:batch_size => 1_000) do
|
136
|
+
# doing things and logging stuff as you please ...
|
137
|
+
end
|
138
|
+
rescue Exception => e
|
139
|
+
log.puts "Broken on id #{id}"
|
140
|
+
raise e
|
141
|
+
end
|
142
|
+
end
|
143
|
+
|
144
|
+
end
|
145
|
+
end
|
146
|
+
```
|
147
|
+
|
148
|
+
|
149
|
+
## Contributing
|
150
|
+
|
151
|
+
1. Fork it ( https://github.com/[my-github-username]/ferry/fork )
|
152
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
153
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
154
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
155
|
+
5. Create a new Pull Request
|
data/Rakefile
ADDED
data/ferry.gemspec
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'ferry/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "ferry"
|
8
|
+
spec.version = Ferry::VERSION
|
9
|
+
spec.authors = ["Anthony Corletti", "Logan Watanabe", "Larry Heimann"]
|
10
|
+
spec.email = ["anthcor@gmail.com", "loganwatanabe@gmail.com", "profh@cmu.edu"]
|
11
|
+
spec.summary = "Ferry is a data migration and data manipulation tool"
|
12
|
+
spec.description = "Ferry is a data migration and data manipulation tool that seeks to simplify the increasingly prevalent big data problems that tech companies face"
|
13
|
+
spec.homepage = "https://github.com/cmu-is-projects/"
|
14
|
+
spec.license = "MIT"
|
15
|
+
|
16
|
+
spec.files = `git ls-files -z`.split("\x0")
|
17
|
+
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
18
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
19
|
+
spec.require_paths = ["lib"]
|
20
|
+
|
21
|
+
spec.add_development_dependency "activerecord"
|
22
|
+
spec.add_development_dependency "bundler", "~> 1.6"
|
23
|
+
spec.add_development_dependency "progressbar"
|
24
|
+
spec.add_development_dependency "rake"
|
25
|
+
spec.add_development_dependency "minitest"
|
26
|
+
end
|
data/lib/ferry.rb
ADDED
data/lib/ferry/engine.rb
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
class Engine
|
2
|
+
def initialize(options={})
|
3
|
+
end
|
4
|
+
|
5
|
+
def run(options, &block)
|
6
|
+
log = options[:log]
|
7
|
+
collection = options[:batch]
|
8
|
+
log.write "collection length: #{collection.length}"
|
9
|
+
begin
|
10
|
+
instance_exec(collection, &block)
|
11
|
+
rescue Exception => e
|
12
|
+
log.write "Error: #{e}"
|
13
|
+
end
|
14
|
+
log.write "worker finished"
|
15
|
+
end
|
16
|
+
end
|
data/lib/ferry/logger.rb
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
class Logger
|
2
|
+
def initialize(options={})
|
3
|
+
@homedir = options[:homedir] ||= "log"
|
4
|
+
FileUtils.mkdir @homedir unless Dir[@homedir].present?
|
5
|
+
FileUtils.touch "#{@homedir}/ferry.log"
|
6
|
+
end
|
7
|
+
|
8
|
+
def write(msg)
|
9
|
+
log = File.open("#{@homedir}/ferry.log", 'w')
|
10
|
+
log.puts msg
|
11
|
+
end
|
12
|
+
end
|
metadata
ADDED
@@ -0,0 +1,129 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: ferry
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Anthony Corletti
|
8
|
+
- Logan Watanabe
|
9
|
+
- Larry Heimann
|
10
|
+
autorequire:
|
11
|
+
bindir: bin
|
12
|
+
cert_chain: []
|
13
|
+
date: 2014-08-30 00:00:00.000000000 Z
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: activerecord
|
17
|
+
requirement: !ruby/object:Gem::Requirement
|
18
|
+
requirements:
|
19
|
+
- - ">="
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :development
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
requirements:
|
26
|
+
- - ">="
|
27
|
+
- !ruby/object:Gem::Version
|
28
|
+
version: '0'
|
29
|
+
- !ruby/object:Gem::Dependency
|
30
|
+
name: bundler
|
31
|
+
requirement: !ruby/object:Gem::Requirement
|
32
|
+
requirements:
|
33
|
+
- - "~>"
|
34
|
+
- !ruby/object:Gem::Version
|
35
|
+
version: '1.6'
|
36
|
+
type: :development
|
37
|
+
prerelease: false
|
38
|
+
version_requirements: !ruby/object:Gem::Requirement
|
39
|
+
requirements:
|
40
|
+
- - "~>"
|
41
|
+
- !ruby/object:Gem::Version
|
42
|
+
version: '1.6'
|
43
|
+
- !ruby/object:Gem::Dependency
|
44
|
+
name: progressbar
|
45
|
+
requirement: !ruby/object:Gem::Requirement
|
46
|
+
requirements:
|
47
|
+
- - ">="
|
48
|
+
- !ruby/object:Gem::Version
|
49
|
+
version: '0'
|
50
|
+
type: :development
|
51
|
+
prerelease: false
|
52
|
+
version_requirements: !ruby/object:Gem::Requirement
|
53
|
+
requirements:
|
54
|
+
- - ">="
|
55
|
+
- !ruby/object:Gem::Version
|
56
|
+
version: '0'
|
57
|
+
- !ruby/object:Gem::Dependency
|
58
|
+
name: rake
|
59
|
+
requirement: !ruby/object:Gem::Requirement
|
60
|
+
requirements:
|
61
|
+
- - ">="
|
62
|
+
- !ruby/object:Gem::Version
|
63
|
+
version: '0'
|
64
|
+
type: :development
|
65
|
+
prerelease: false
|
66
|
+
version_requirements: !ruby/object:Gem::Requirement
|
67
|
+
requirements:
|
68
|
+
- - ">="
|
69
|
+
- !ruby/object:Gem::Version
|
70
|
+
version: '0'
|
71
|
+
- !ruby/object:Gem::Dependency
|
72
|
+
name: minitest
|
73
|
+
requirement: !ruby/object:Gem::Requirement
|
74
|
+
requirements:
|
75
|
+
- - ">="
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0'
|
78
|
+
type: :development
|
79
|
+
prerelease: false
|
80
|
+
version_requirements: !ruby/object:Gem::Requirement
|
81
|
+
requirements:
|
82
|
+
- - ">="
|
83
|
+
- !ruby/object:Gem::Version
|
84
|
+
version: '0'
|
85
|
+
description: Ferry is a data migration and data manipulation tool that seeks to simplify
|
86
|
+
the increasingly prevalent big data problems that tech companies face
|
87
|
+
email:
|
88
|
+
- anthcor@gmail.com
|
89
|
+
- loganwatanabe@gmail.com
|
90
|
+
- profh@cmu.edu
|
91
|
+
executables: []
|
92
|
+
extensions: []
|
93
|
+
extra_rdoc_files: []
|
94
|
+
files:
|
95
|
+
- ".gitignore"
|
96
|
+
- Gemfile
|
97
|
+
- LICENSE.txt
|
98
|
+
- README.md
|
99
|
+
- Rakefile
|
100
|
+
- ferry.gemspec
|
101
|
+
- lib/ferry.rb
|
102
|
+
- lib/ferry/engine.rb
|
103
|
+
- lib/ferry/logger.rb
|
104
|
+
- lib/ferry/version.rb
|
105
|
+
homepage: https://github.com/cmu-is-projects/
|
106
|
+
licenses:
|
107
|
+
- MIT
|
108
|
+
metadata: {}
|
109
|
+
post_install_message:
|
110
|
+
rdoc_options: []
|
111
|
+
require_paths:
|
112
|
+
- lib
|
113
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
114
|
+
requirements:
|
115
|
+
- - ">="
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '0'
|
118
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
119
|
+
requirements:
|
120
|
+
- - ">="
|
121
|
+
- !ruby/object:Gem::Version
|
122
|
+
version: '0'
|
123
|
+
requirements: []
|
124
|
+
rubyforge_project:
|
125
|
+
rubygems_version: 2.2.2
|
126
|
+
signing_key:
|
127
|
+
specification_version: 4
|
128
|
+
summary: Ferry is a data migration and data manipulation tool
|
129
|
+
test_files: []
|