databender 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/{LICENSE → LICENSE.md} +1 -1
- data/README.md +70 -17
- data/assets/demo.gif +0 -0
- data/databender.gemspec +23 -22
- data/lib/databender/cli/main.rb +2 -2
- data/lib/databender/connection.rb +1 -1
- data/lib/databender/runner.rb +21 -16
- data/lib/databender/templates/database.yml +2 -2
- data/lib/databender/version.rb +1 -1
- data/lib/subset.sh.mustache +1 -1
- metadata +4 -4
- data/LICENSE.txt +0 -22
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4ad9e917c73266362f09b6e5c415f328b13fc8b1
|
4
|
+
data.tar.gz: 189bb02bc2cc8cf69a178efb2ae5b6153149d9bd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 38c12d2545e3df7308901e065ecf59cb9d05922a1285950bb60b186f0b9ddc337319c017085a5960c3530ef357310ae5356a653ad2890e4e511c285aaae99f45
|
7
|
+
data.tar.gz: c7974b08526e052e6426fba283d17ef9a7292993b1859099410024a0a909e88b5d2865b531dee7acbc2e4e6887cffeafcf7b5151bea6f1ef0d108e8f56799e8b
|
data/{LICENSE → LICENSE.md}
RENAMED
data/README.md
CHANGED
@@ -1,31 +1,84 @@
|
|
1
1
|
# Databender
|
2
2
|
|
3
|
-
|
3
|
+
Ruby script to generate a database subset driven by configuration based rule-engine
|
4
4
|
|
5
|
-
|
5
|
+
#### Demo
|
6
6
|
|
7
|
-
|
7
|
+
![alt tag](https://github.com/rcdexta/databender/raw/master/assets/demo.gif)
|
8
8
|
|
9
|
-
|
10
|
-
|
9
|
+
#### Why
|
10
|
+
|
11
|
+
If you have to quickly boot up a micro-service or any application in your local machine and you are stuck because the service has dependent seed data that needs to be present in the database before starting up, you have couple options:
|
12
|
+
|
13
|
+
* automate data generation using tools like [bobcat](https://github.com/ThoughtWorksStudios/bobcat)
|
14
|
+
* use the fixtures that power your testing suite to generate the seed data
|
15
|
+
* generate a subset of the data from one of the working environments (staging, uat)
|
16
|
+
|
17
|
+
Databender aims to offer an easy and seamless solution to solve the last option.
|
18
|
+
|
19
|
+
#### Features
|
20
|
+
|
21
|
+
* configuration driven rule engine
|
22
|
+
* can add filters at table level or globally at column level
|
23
|
+
* can resolve sequence of tables to import based on referential integrity (foreign key dependencies)
|
24
|
+
|
25
|
+
#### Installation
|
26
|
+
|
27
|
+
Install the gem to install the command-line cli
|
28
|
+
|
29
|
+
```bash
|
30
|
+
$ gem install databender
|
31
|
+
```
|
32
|
+
|
33
|
+
and the type
|
34
|
+
|
35
|
+
```shell
|
36
|
+
$ databender --help
|
11
37
|
```
|
12
38
|
|
13
|
-
|
39
|
+
to know the list of available commands.
|
14
40
|
|
15
|
-
|
41
|
+
#### Usage
|
16
42
|
|
17
|
-
|
43
|
+
First initialise the configuration for the database you would like to take a subset of
|
18
44
|
|
19
|
-
|
45
|
+
```powershell
|
46
|
+
$ databender init --db-name=employees
|
47
|
+
```
|
48
|
+
|
49
|
+
> Note: I have taken the MySQL public dataset available here: https://github.com/datacharmer/test_db as the sample dataset to illustrate the gem
|
50
|
+
|
51
|
+
This should create a `config` folder and a `database.yml` file. Specify the connection params to the source database in `database.yml` file. Inspect `filters/employees.yml` to specify the rules for generating the subset. The comments in the file should serve as good documentation to specify the table and column filters. Find a sample filter configuration below.
|
52
|
+
|
53
|
+
```yaml
|
54
|
+
tables:
|
55
|
+
# Tables with rows lesser than min_row_count will be fully imported with no filters applied
|
56
|
+
min_row_count: 20
|
57
|
+
|
58
|
+
# For tables with no filters, the maximum number of rows to import
|
59
|
+
max_row_count: 1000
|
60
|
+
|
61
|
+
# specify table specific filters here
|
62
|
+
filters:
|
63
|
+
employees: hire_date >= '1994-01-01'
|
64
|
+
departments: dept_name in ('d004', 'd005')
|
65
|
+
|
66
|
+
columns:
|
67
|
+
# specify column filters applicable to all tables that contain that column
|
68
|
+
filters:
|
69
|
+
birth_date: birth_date >= '1950-01-01'
|
70
|
+
|
71
|
+
```
|
72
|
+
|
73
|
+
Now you can run the generator using
|
74
|
+
|
75
|
+
```shell
|
76
|
+
$ databender generate --db-name=employees
|
77
|
+
```
|
20
78
|
|
21
|
-
|
79
|
+
This should generate another database called `employees_subset` with the subset data and also create a dump of the file gzipped.
|
22
80
|
|
23
|
-
TODO: Write usage instructions here
|
24
81
|
|
25
|
-
|
82
|
+
#### License
|
26
83
|
|
27
|
-
|
28
|
-
2. Create your feature branch (`git checkout -b my-new-feature`)
|
29
|
-
3. Commit your changes (`git commit -am 'Add some feature'`)
|
30
|
-
4. Push to the branch (`git push origin my-new-feature`)
|
31
|
-
5. Create a new Pull Request
|
84
|
+
MIT
|
data/assets/demo.gif
ADDED
Binary file
|
data/databender.gemspec
CHANGED
@@ -3,30 +3,31 @@ lib = File.expand_path('../lib', __FILE__)
|
|
3
3
|
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
4
|
require 'databender/version'
|
5
5
|
|
6
|
-
Gem::Specification.new do |
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = 'databender'
|
8
|
+
s.version = Databender::VERSION
|
9
|
+
s.authors = ['RC']
|
10
|
+
s.email = ['rc.chandru@gmail.com']
|
11
|
+
s.summary = %q{Database subset generator}
|
12
|
+
s.description = %q{Database subset generator}
|
13
|
+
s.homepage = ''
|
14
|
+
s.license = 'MIT'
|
15
|
+
s.homepage = 'https://github.com/rcdexta/databender'
|
15
16
|
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
17
|
+
s.files = `git ls-files -z`.split("\x0")
|
18
|
+
s.executables = ['databender']
|
19
|
+
s.test_files = s.files.grep(%r{^(test|s|features)/})
|
20
|
+
s.require_paths = ['lib']
|
20
21
|
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
22
|
+
s.add_dependency 'thor', '~> 0.20.0'
|
23
|
+
s.add_dependency 'activerecord', '=5.1.4'
|
24
|
+
s.add_dependency 'mysql2', '=0.4.9'
|
25
|
+
s.add_dependency 'mustache', '~> 1.0', '>= 1.0.5'
|
26
|
+
s.add_dependency 'configatron', '~> 4.5', '>= 4.5.1'
|
27
|
+
s.add_dependency 'terminal-table', '~> 1.8', '>= 1.8.0'
|
27
28
|
|
28
|
-
|
29
|
-
|
30
|
-
|
29
|
+
s.add_development_dependency 'bundler', '~> 1.7'
|
30
|
+
s.add_development_dependency 'rake', '~> 10.0'
|
31
|
+
s.add_development_dependency 'pry', '~> 0'
|
31
32
|
|
32
33
|
end
|
data/lib/databender/cli/main.rb
CHANGED
@@ -29,9 +29,9 @@ module Databender
|
|
29
29
|
desc 'generate', 'Generate subset given a database'
|
30
30
|
def generate
|
31
31
|
say "Creating subset for #{options[:db_name]}", :green
|
32
|
-
Databender::Runner.process! options[:db_name]
|
33
|
-
puts ''
|
32
|
+
source, report_queries = Databender::Runner.process! options[:db_name]
|
34
33
|
run 'sh subset.sh', verbose: false
|
34
|
+
Databender::Runner.print_report source, report_queries
|
35
35
|
end
|
36
36
|
|
37
37
|
end
|
data/lib/databender/runner.rb
CHANGED
@@ -14,13 +14,25 @@ module Databender
|
|
14
14
|
class Runner
|
15
15
|
extend Databender::SQLHelper
|
16
16
|
|
17
|
-
|
18
|
-
|
19
17
|
def self.generate_script(params)
|
20
18
|
template = File.read("#{GEM_ROOT}/subset.sh.mustache")
|
21
19
|
File.write('subset.sh', Mustache.render(template, params))
|
22
20
|
end
|
23
21
|
|
22
|
+
def self.print_report(source, report_queries)
|
23
|
+
report = []
|
24
|
+
report_queries.each do |rq|
|
25
|
+
all_count = source.execute_count(rq[:all_count_sql]).first.first
|
26
|
+
subset_count = source.execute_count(rq[:filter_count_sql]).count
|
27
|
+
report << [rq[:table], all_count, subset_count, rq[:filter]]
|
28
|
+
end
|
29
|
+
headings = ['Table Name', 'Total Rows', 'Fetched Rows', 'Filter(s)']
|
30
|
+
tty = Terminal::Table.new headings: headings, rows: report
|
31
|
+
puts ''
|
32
|
+
puts 'Generating report...'
|
33
|
+
puts tty
|
34
|
+
end
|
35
|
+
|
24
36
|
def self.apply_column_filters(table, source, source_db, target_db)
|
25
37
|
columns = source.columns_for(source_db, table.name)
|
26
38
|
overlapping_filters = Databender::Config.column_filters.keys & columns
|
@@ -36,8 +48,6 @@ module Databender
|
|
36
48
|
exit(1)
|
37
49
|
end
|
38
50
|
|
39
|
-
report = []
|
40
|
-
|
41
51
|
Databender::Config.load!(db_name)
|
42
52
|
target_db = Databender::Config.target_db
|
43
53
|
source_db = Databender::Config.source.database
|
@@ -53,11 +63,12 @@ module Databender
|
|
53
63
|
|
54
64
|
ordered_tables = Databender::TableOrder.order_by_foreign_key_dependency(source, source_db, all_tables)
|
55
65
|
|
66
|
+
report_queries = []
|
67
|
+
|
56
68
|
entries = ordered_tables.collect do |table|
|
57
69
|
column_filters = apply_column_filters table, source, source_db, target_db
|
58
|
-
|
59
|
-
|
60
|
-
sql, count_query, filter = if Databender::Config.table_filters.keys.include?(table.name)
|
70
|
+
all_count_sql = count_all_query source_db, table.name
|
71
|
+
sql, filter_count_sql, filter = if Databender::Config.table_filters.keys.include?(table.name)
|
61
72
|
conditions = merge_filters(Databender::Config.table_filters[table.name], column_filters)
|
62
73
|
[insert_into_select(source_db, table.name, conditions),
|
63
74
|
count_filtered_query(source_db, table.name, conditions), conditions]
|
@@ -67,21 +78,15 @@ module Databender
|
|
67
78
|
else
|
68
79
|
parents = Databender::TableOrder.parent_tables_for(table.name)
|
69
80
|
condition = parents.present? ? where_clause_by_reference(target_db, parents) : nil
|
70
|
-
[insert_into_select(source_db, table.name, condition), count_filtered_query(source_db, table.name, condition),
|
81
|
+
[insert_into_select(source_db, table.name, condition), count_filtered_query(source_db, table.name, condition), condition]
|
71
82
|
end
|
72
83
|
end
|
73
|
-
|
74
|
-
report << [table.name, all_count, subset_count, filter && filter]
|
84
|
+
report_queries << {table: table.name, all_count_sql: all_count_sql, filter_count_sql: filter_count_sql, filter: filter}
|
75
85
|
{sql: sql, table: table.name}
|
76
86
|
end
|
77
87
|
|
78
|
-
headings = ['Table Name', 'Total Rows', 'Fetched Rows', 'Filter(s)']
|
79
|
-
tty = Terminal::Table.new headings: headings, rows: report
|
80
|
-
puts tty
|
81
|
-
|
82
|
-
|
83
88
|
self.generate_script source: Databender::Config.source, target_db: target_db, entries: entries, database: db_name
|
84
|
-
|
89
|
+
[source, report_queries]
|
85
90
|
end
|
86
91
|
end
|
87
92
|
end
|
data/lib/databender/version.rb
CHANGED
data/lib/subset.sh.mustache
CHANGED
@@ -36,5 +36,5 @@ mkdir -p dumps
|
|
36
36
|
|
37
37
|
MYSQL_PWD='{{source.password}}' mysqldump -u{{source.username}} -h{{source.host}} -P{{source.port}} {{target_db}} | gzip > dumps/{{database}}.sql.gz
|
38
38
|
|
39
|
-
echo "\033[92mThe dump of db_subset is available at dumps/{{database}}.sql.gz. Use gunzip to extract followed by mysql command to load. The subset database is intact in the db server too
|
39
|
+
echo "\033[92mThe dump of db_subset is available at dumps/{{database}}.sql.gz. Use gunzip to extract followed by mysql command to load. The subset database is intact in the db server too!\033[0m"
|
40
40
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: databender
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- RC
|
@@ -166,10 +166,10 @@ files:
|
|
166
166
|
- ".ruby-gemset"
|
167
167
|
- ".ruby-version"
|
168
168
|
- Gemfile
|
169
|
-
- LICENSE
|
170
|
-
- LICENSE.txt
|
169
|
+
- LICENSE.md
|
171
170
|
- README.md
|
172
171
|
- Rakefile
|
172
|
+
- assets/demo.gif
|
173
173
|
- bin/databender
|
174
174
|
- databender.gemspec
|
175
175
|
- dumps/magic_list_service_test.sql.gz
|
@@ -186,7 +186,7 @@ files:
|
|
186
186
|
- lib/databender/templates/filter.yml
|
187
187
|
- lib/databender/version.rb
|
188
188
|
- lib/subset.sh.mustache
|
189
|
-
homepage:
|
189
|
+
homepage: https://github.com/rcdexta/databender
|
190
190
|
licenses:
|
191
191
|
- MIT
|
192
192
|
metadata: {}
|
data/LICENSE.txt
DELETED
@@ -1,22 +0,0 @@
|
|
1
|
-
Copyright (c) 2015 RC
|
2
|
-
|
3
|
-
MIT License
|
4
|
-
|
5
|
-
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
-
a copy of this software and associated documentation files (the
|
7
|
-
"Software"), to deal in the Software without restriction, including
|
8
|
-
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
-
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
-
permit persons to whom the Software is furnished to do so, subject to
|
11
|
-
the following conditions:
|
12
|
-
|
13
|
-
The above copyright notice and this permission notice shall be
|
14
|
-
included in all copies or substantial portions of the Software.
|
15
|
-
|
16
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
-
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
-
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
-
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
-
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
-
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
-
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|