ferry 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 135d110be1a1feb5802cb1035af0fab9bf98bea1
4
+ data.tar.gz: b53e81709fc2a3bbef1314732f8feae82ffe2d41
5
+ SHA512:
6
+ metadata.gz: 8274f3a35429634000afd5e6cd106ee24e92a6db2ff7a4e40db9ebd80768ca9f743e1c3534757b23cdd8a161ad1177a7564f118961a6eec083831e61b03629f3
7
+ data.tar.gz: d9f0d4414f852a925b20e0ee9697a5bec2b203b8974ac6018e440f1f195b28014d474e51fa6c2e11d449c40ff06622f5a9eb0b1d534595f5c74a8004b7c68d25
data/.gitignore ADDED
@@ -0,0 +1,22 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+ *.bundle
19
+ *.so
20
+ *.o
21
+ *.a
22
+ mkmf.log
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in ferry.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 Anthony Corletti
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,155 @@
1
+ # Ferry
2
+
3
+ Ferry is a data migration and data manipulation tool that seeks to quickly and easily reduce overhead when dealing with big data problems.
4
+
5
+ ## TO-DO
6
+
7
+ - [ ] Refactoring before public release
8
+ - [x] Define action-items for refactor
9
+ - [x] Provide working example(s) of using ferry
10
+ - [ ] Public release fine-tuning
11
+ - [ ] Tests
12
+ - [ ] Testing input for migrate method (max_workers, batch_size)
13
+ - [ ] Testing that there is an ActiveRecord::Relation object being passed to find_in_batches
14
+ - [ ] Migration Scenarios - dummy class migration
15
+ - [ ] Refactor logging logic into Logger class
16
+ - [x] Initial revision
17
+ - [ ] Review
18
+
19
+ ## Installation
20
+
21
+ Add this line to your application's Gemfile:
22
+
23
+ gem 'ferry'
24
+
25
+ And then execute:
26
+
27
+ $ bundle
28
+
29
+ Or install it yourself as:
30
+
31
+ $ gem install ferry
32
+
33
+ ## Usage
34
+
35
+ Usage pending. See examples / submit PR's for your ideas.
36
+
37
+ ## Example(s)
38
+
39
+ ###### 29 July 2014
40
+ Version 0.0.1 is functional with the rake task defined here :: https://github.com/customink/design_content_migration/blob/master/lib/tasks/ferry_example.rake#L10
41
+
42
+ Please manually install ferry from your locally cloned repo ...
43
+ ```
44
+ git clone git@github.com:customink/ferry.git
45
+ cd ferry
46
+ gem build ferry.gemspec
47
+ gem install ferry
48
+ ```
49
+ add it to your app's Gemfile
50
+ ```
51
+ gem 'ferry'
52
+ ```
53
+ and then
54
+ ```
55
+ bundle install
56
+ ```
57
+ as it has not been pushed to rubygems.com yet.
58
+
59
+ Tests - Coming soon to an editor near me!
60
+
61
+ ###### 28 July 2014
62
+ Ferry should not support Oracle.
63
+
64
+ ###### 25 July 2014
65
+ After a few more reviews with @metaskills, @gilr00y, @jdlehman, and @danielwheeler1987, Ferry will extend ActiveRecord with a "migrate" (more legit name search still in naming progress) method. From there we are going to pass the same relation to find in batches to a worker which will plow through the batch passed to it via a yield call from the task.
66
+
67
+ Tests will include; validate the data passed into the worker (log) and testing that there is an ActiveRecord::Relation being passed to find_in_batches.
68
+
69
+ ###### 23 July 2014
70
+ After a few chats with @gilr00y and @jdlehman Ferry may extend ActiveRecord with a "migrate" method we could call on an ActiveRecord object. From there that object would call an Engine instance with appropriate fields to kickoff the actual data migration.
71
+
72
+ There is some logic duplication and layer duplication between the Engine class and the "migrate" method that extends ActiveRecord. Still working out how to concisely write logic that handles the management of forking connection and engine init calls.
73
+
74
+ ```
75
+ require "ferry/version"
76
+ require 'models/engine'
77
+ require 'models/logger'
78
+
79
+ module Ferry
80
+ class ActiveRecord
81
+ def self.migrate(&block)
82
+ yield
83
+ end
84
+ end
85
+ end
86
+ ```
87
+
88
+ This implementation should be able to run something like this ...
89
+
90
+ ```
91
+ engine = Engine.new(
92
+ Design.where("savedate > ?", 6.months.ago.strftime("%d.%m.%Y %H").to_datetime).id,
93
+ Design.where("savedate > ?", 3.months.ago.strftime("%d.%m.%Y %H").to_datetime).id,
94
+ 100_000,
95
+ 1_000,
96
+ "log/ferry"
97
+ )
98
+
99
+ Design.where("savedate > ?", 130.hours.ago.strftime("%d.%m.%Y %H").to_datetime).migrate(
100
+ engine.run do | start_id, end_id, chunk_size, batch_size, log |
101
+ worker.run do | start_id, chunk_size, batch_size, log |
102
+ worker_end_id = start_id + chunk_size - 1
103
+ Design.where("id >= ? && id <= ?", start_id, worker_end_id).find_in_batches(batch_size: batch_size) do |batch|
104
+ # move and manipulate data as you please
105
+ end
106
+ start_id += batch_size
107
+ end
108
+ end
109
+ )
110
+
111
+ ```
112
+
113
+ ###### 22 July 2014
114
+ After installing ferry to your local machine or bundling from your gemfile - in your migration task make sure to define your chunker as such ...
115
+
116
+ ```
117
+ require 'ferry'
118
+
119
+ namespace :example do
120
+ task "my_migration_task" do
121
+
122
+ ferry = Engine.new(
123
+ :max_workers => number_of_workers ex:8,
124
+ :start_id => where_are_we_starting ex:2910, Model.first.id,
125
+ :end_id => where_are_we_ending ex:8190, Model.last.id,
126
+ :chunk_size => size_of_chunks_that_workers_will_process ex:42,
127
+ :working_dir => ex:"path/to/working_dir"
128
+ )
129
+
130
+ ferry.run do |start_id, chunk_size, log|
131
+ begin
132
+ work = Model.select(":id").where("? <= id and id < ?", start_id, start_id + chunk_size)
133
+ rows_to_process = rel.count
134
+ log.puts("rows_to_process: #{rows_to_process}")
135
+ work.find_in_batches(:batch_size => 1_000) do
136
+ # doing things and logging stuff as you please ...
137
+ end
138
+ rescue Exception => e
139
+ log.puts "Broken on id #{id}"
140
+ raise e
141
+ end
142
+ end
143
+
144
+ end
145
+ end
146
+ ```
147
+
148
+
149
+ ## Contributing
150
+
151
+ 1. Fork it ( https://github.com/[my-github-username]/ferry/fork )
152
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
153
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
154
+ 4. Push to the branch (`git push origin my-new-feature`)
155
+ 5. Create a new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+ Bundler::GemHelper.install_tasks
data/ferry.gemspec ADDED
@@ -0,0 +1,26 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'ferry/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "ferry"
8
+ spec.version = Ferry::VERSION
9
+ spec.authors = ["Anthony Corletti", "Logan Watanabe", "Larry Heimann"]
10
+ spec.email = ["anthcor@gmail.com", "loganwatanabe@gmail.com", "profh@cmu.edu"]
11
+ spec.summary = "Ferry is a data migration and data manipulation tool"
12
+ spec.description = "Ferry is a data migration and data manipulation tool that seeks to simplify the increasingly prevalent big data problems that tech companies face"
13
+ spec.homepage = "https://github.com/cmu-is-projects/"
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0")
17
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "activerecord"
22
+ spec.add_development_dependency "bundler", "~> 1.6"
23
+ spec.add_development_dependency "progressbar"
24
+ spec.add_development_dependency "rake"
25
+ spec.add_development_dependency "minitest"
26
+ end
data/lib/ferry.rb ADDED
@@ -0,0 +1,7 @@
1
+ require "ferry/version"
2
+ require "ferry/engine"
3
+ require "ferry/logger"
4
+
5
+ module Ferry
6
+ # Your code goes here...
7
+ end
@@ -0,0 +1,16 @@
1
+ class Engine
2
+ def initialize(options={})
3
+ end
4
+
5
+ def run(options, &block)
6
+ log = options[:log]
7
+ collection = options[:batch]
8
+ log.write "collection length: #{collection.length}"
9
+ begin
10
+ instance_exec(collection, &block)
11
+ rescue Exception => e
12
+ log.write "Error: #{e}"
13
+ end
14
+ log.write "worker finished"
15
+ end
16
+ end
@@ -0,0 +1,12 @@
1
+ class Logger
2
+ def initialize(options={})
3
+ @homedir = options[:homedir] ||= "log"
4
+ FileUtils.mkdir @homedir unless Dir[@homedir].present?
5
+ FileUtils.touch "#{@homedir}/ferry.log"
6
+ end
7
+
8
+ def write(msg)
9
+ log = File.open("#{@homedir}/ferry.log", 'w')
10
+ log.puts msg
11
+ end
12
+ end
@@ -0,0 +1,3 @@
1
+ module Ferry
2
+ VERSION = "0.0.1"
3
+ end
metadata ADDED
@@ -0,0 +1,129 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ferry
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Anthony Corletti
8
+ - Logan Watanabe
9
+ - Larry Heimann
10
+ autorequire:
11
+ bindir: bin
12
+ cert_chain: []
13
+ date: 2014-08-30 00:00:00.000000000 Z
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: activerecord
17
+ requirement: !ruby/object:Gem::Requirement
18
+ requirements:
19
+ - - ">="
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ requirements:
26
+ - - ">="
27
+ - !ruby/object:Gem::Version
28
+ version: '0'
29
+ - !ruby/object:Gem::Dependency
30
+ name: bundler
31
+ requirement: !ruby/object:Gem::Requirement
32
+ requirements:
33
+ - - "~>"
34
+ - !ruby/object:Gem::Version
35
+ version: '1.6'
36
+ type: :development
37
+ prerelease: false
38
+ version_requirements: !ruby/object:Gem::Requirement
39
+ requirements:
40
+ - - "~>"
41
+ - !ruby/object:Gem::Version
42
+ version: '1.6'
43
+ - !ruby/object:Gem::Dependency
44
+ name: progressbar
45
+ requirement: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: '0'
50
+ type: :development
51
+ prerelease: false
52
+ version_requirements: !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: '0'
57
+ - !ruby/object:Gem::Dependency
58
+ name: rake
59
+ requirement: !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - ">="
62
+ - !ruby/object:Gem::Version
63
+ version: '0'
64
+ type: :development
65
+ prerelease: false
66
+ version_requirements: !ruby/object:Gem::Requirement
67
+ requirements:
68
+ - - ">="
69
+ - !ruby/object:Gem::Version
70
+ version: '0'
71
+ - !ruby/object:Gem::Dependency
72
+ name: minitest
73
+ requirement: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ type: :development
79
+ prerelease: false
80
+ version_requirements: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
85
+ description: Ferry is a data migration and data manipulation tool that seeks to simplify
86
+ the increasingly prevalent big data problems that tech companies face
87
+ email:
88
+ - anthcor@gmail.com
89
+ - loganwatanabe@gmail.com
90
+ - profh@cmu.edu
91
+ executables: []
92
+ extensions: []
93
+ extra_rdoc_files: []
94
+ files:
95
+ - ".gitignore"
96
+ - Gemfile
97
+ - LICENSE.txt
98
+ - README.md
99
+ - Rakefile
100
+ - ferry.gemspec
101
+ - lib/ferry.rb
102
+ - lib/ferry/engine.rb
103
+ - lib/ferry/logger.rb
104
+ - lib/ferry/version.rb
105
+ homepage: https://github.com/cmu-is-projects/
106
+ licenses:
107
+ - MIT
108
+ metadata: {}
109
+ post_install_message:
110
+ rdoc_options: []
111
+ require_paths:
112
+ - lib
113
+ required_ruby_version: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ required_rubygems_version: !ruby/object:Gem::Requirement
119
+ requirements:
120
+ - - ">="
121
+ - !ruby/object:Gem::Version
122
+ version: '0'
123
+ requirements: []
124
+ rubyforge_project:
125
+ rubygems_version: 2.2.2
126
+ signing_key:
127
+ specification_version: 4
128
+ summary: Ferry is a data migration and data manipulation tool
129
+ test_files: []