databender 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 4f0224e71e9970b7d51d20dc020ceb5ddb449f37
4
- data.tar.gz: 5fe6a3f2af057f4d523b8bbc3bf9c0672a943833
3
+ metadata.gz: 4ad9e917c73266362f09b6e5c415f328b13fc8b1
4
+ data.tar.gz: 189bb02bc2cc8cf69a178efb2ae5b6153149d9bd
5
5
  SHA512:
6
- metadata.gz: 5a88848c0ee536aacbc32914c7fa84a95c70475ac10c3d0c679fdf47b3a7be4f66279bca268397377fe4e31a54a91702ab7ad4597d317cc2a3649b15f81b7937
7
- data.tar.gz: 90f1af10b4950336312cb406e3b7c5e1f60de5fd2906bcb6c71d76d8c34b063c7aad6016a464a96b114896aa9fc2abe54fbb82c65d0cd1b82826ed2707ab67ca
6
+ metadata.gz: 38c12d2545e3df7308901e065ecf59cb9d05922a1285950bb60b186f0b9ddc337319c017085a5960c3530ef357310ae5356a653ad2890e4e511c285aaae99f45
7
+ data.tar.gz: c7974b08526e052e6426fba283d17ef9a7292993b1859099410024a0a909e88b5d2865b531dee7acbc2e4e6887cffeafcf7b5151bea6f1ef0d108e8f56799e8b
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2015 RC
3
+ Copyright (c) 2017 RC
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,31 +1,84 @@
1
1
  # Databender
2
2
 
3
- Database Subset Generator
3
+ Ruby script to generate a database subset driven by configuration based rule-engine
4
4
 
5
- ## Installation
5
+ #### Demo
6
6
 
7
- Add this line to your application's Gemfile:
7
+ ![alt tag](https://github.com/rcdexta/databender/raw/master/assets/demo.gif)
8
8
 
9
- ```ruby
10
- gem 'databender'
9
+ #### Why
10
+
11
+ If you have to quickly boot up a micro-service or any application in your local machine and you are stuck because the service has dependent seed data that needs to be present in the database before starting up, you have couple options:
12
+
13
+ * automate data generation using tools like [bobcat](https://github.com/ThoughtWorksStudios/bobcat)
14
+ * use the fixtures that power your testing suite to generate the seed data
15
+ * generate a subset of the data from one of the working environments (staging, uat)
16
+
17
+ Databender aims to offer an easy and seamless solution to solve the last option.
18
+
19
+ #### Features
20
+
21
+ * configuration driven rule engine
22
+ * can add filters at table level or globally at column level
23
+ * can resolve sequence of tables to import based on referential integrity (foreign key dependencies)
24
+
25
+ #### Installation
26
+
27
+ Install the gem to install the command-line cli
28
+
29
+ ```bash
30
+ $ gem install databender
31
+ ```
32
+
33
+ and the type
34
+
35
+ ```shell
36
+ $ databender --help
11
37
  ```
12
38
 
13
- And then execute:
39
+ to know the list of available commands.
14
40
 
15
- $ bundle
41
+ #### Usage
16
42
 
17
- Or install it yourself as:
43
+ First initialise the configuration for the database you would like to take a subset of
18
44
 
19
- $ gem install databender
45
+ ```powershell
46
+ $ databender init --db-name=employees
47
+ ```
48
+
49
+ > Note: I have taken the MySQL public dataset available here: https://github.com/datacharmer/test_db as the sample dataset to illustrate the gem
50
+
51
+ This should create a `config` folder and a `database.yml` file. Specify the connection params to the source database in `database.yml` file. Inspect `filters/employees.yml` to specify the rules for generating the subset. The comments in the file should serve as good documentation to specify the table and column filters. Find a sample filter configuration below.
52
+
53
+ ```yaml
54
+ tables:
55
+ # Tables with rows lesser than min_row_count will be fully imported with no filters applied
56
+ min_row_count: 20
57
+
58
+ # For tables with no filters, the maximum number of rows to import
59
+ max_row_count: 1000
60
+
61
+ # specify table specific filters here
62
+ filters:
63
+ employees: hire_date >= '1994-01-01'
64
+ departments: dept_name in ('d004', 'd005')
65
+
66
+ columns:
67
+ # specify column filters applicable to all tables that contain that column
68
+ filters:
69
+ birth_date: birth_date >= '1950-01-01'
70
+
71
+ ```
72
+
73
+ Now you can run the generator using
74
+
75
+ ```shell
76
+ $ databender generate --db-name=employees
77
+ ```
20
78
 
21
- ## Usage
79
+ This should generate another database called `employees_subset` with the subset data and also create a dump of the file gzipped.
22
80
 
23
- TODO: Write usage instructions here
24
81
 
25
- ## Contributing
82
+ #### License
26
83
 
27
- 1. Fork it ( https://github.com/[my-github-username]/databender/fork )
28
- 2. Create your feature branch (`git checkout -b my-new-feature`)
29
- 3. Commit your changes (`git commit -am 'Add some feature'`)
30
- 4. Push to the branch (`git push origin my-new-feature`)
31
- 5. Create a new Pull Request
84
+ MIT
Binary file
@@ -3,30 +3,31 @@ lib = File.expand_path('../lib', __FILE__)
3
3
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
4
  require 'databender/version'
5
5
 
6
- Gem::Specification.new do |spec|
7
- spec.name = 'databender'
8
- spec.version = Databender::VERSION
9
- spec.authors = ['RC']
10
- spec.email = ['rc.chandru@gmail.com']
11
- spec.summary = %q{Database subset generator}
12
- spec.description = %q{Database subset generator}
13
- spec.homepage = ''
14
- spec.license = 'MIT'
6
+ Gem::Specification.new do |s|
7
+ s.name = 'databender'
8
+ s.version = Databender::VERSION
9
+ s.authors = ['RC']
10
+ s.email = ['rc.chandru@gmail.com']
11
+ s.summary = %q{Database subset generator}
12
+ s.description = %q{Database subset generator}
13
+ s.homepage = ''
14
+ s.license = 'MIT'
15
+ s.homepage = 'https://github.com/rcdexta/databender'
15
16
 
16
- spec.files = `git ls-files -z`.split("\x0")
17
- spec.executables = ['databender']
18
- spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
- spec.require_paths = ['lib']
17
+ s.files = `git ls-files -z`.split("\x0")
18
+ s.executables = ['databender']
19
+ s.test_files = s.files.grep(%r{^(test|s|features)/})
20
+ s.require_paths = ['lib']
20
21
 
21
- spec.add_dependency 'thor', '~> 0.20.0'
22
- spec.add_dependency 'activerecord', '=5.1.4'
23
- spec.add_dependency 'mysql2', '=0.4.9'
24
- spec.add_dependency 'mustache', '~> 1.0', '>= 1.0.5'
25
- spec.add_dependency 'configatron', '~> 4.5', '>= 4.5.1'
26
- spec.add_dependency 'terminal-table', '~> 1.8', '>= 1.8.0'
22
+ s.add_dependency 'thor', '~> 0.20.0'
23
+ s.add_dependency 'activerecord', '=5.1.4'
24
+ s.add_dependency 'mysql2', '=0.4.9'
25
+ s.add_dependency 'mustache', '~> 1.0', '>= 1.0.5'
26
+ s.add_dependency 'configatron', '~> 4.5', '>= 4.5.1'
27
+ s.add_dependency 'terminal-table', '~> 1.8', '>= 1.8.0'
27
28
 
28
- spec.add_development_dependency 'bundler', '~> 1.7'
29
- spec.add_development_dependency 'rake', '~> 10.0'
30
- spec.add_development_dependency 'pry', '~> 0'
29
+ s.add_development_dependency 'bundler', '~> 1.7'
30
+ s.add_development_dependency 'rake', '~> 10.0'
31
+ s.add_development_dependency 'pry', '~> 0'
31
32
 
32
33
  end
@@ -29,9 +29,9 @@ module Databender
29
29
  desc 'generate', 'Generate subset given a database'
30
30
  def generate
31
31
  say "Creating subset for #{options[:db_name]}", :green
32
- Databender::Runner.process! options[:db_name]
33
- puts ''
32
+ source, report_queries = Databender::Runner.process! options[:db_name]
34
33
  run 'sh subset.sh', verbose: false
34
+ Databender::Runner.print_report source, report_queries
35
35
  end
36
36
 
37
37
  end
@@ -20,7 +20,7 @@ module Databender
20
20
  execute(%[
21
21
  SELECT table_name
22
22
  FROM information_schema.tables
23
- WHERE table_schema = '#{db_name}';
23
+ WHERE table_schema = '#{db_name}' and table_type = 'BASE TABLE';
24
24
  ])
25
25
  end
26
26
 
@@ -14,13 +14,25 @@ module Databender
14
14
  class Runner
15
15
  extend Databender::SQLHelper
16
16
 
17
-
18
-
19
17
  def self.generate_script(params)
20
18
  template = File.read("#{GEM_ROOT}/subset.sh.mustache")
21
19
  File.write('subset.sh', Mustache.render(template, params))
22
20
  end
23
21
 
22
+ def self.print_report(source, report_queries)
23
+ report = []
24
+ report_queries.each do |rq|
25
+ all_count = source.execute_count(rq[:all_count_sql]).first.first
26
+ subset_count = source.execute_count(rq[:filter_count_sql]).count
27
+ report << [rq[:table], all_count, subset_count, rq[:filter]]
28
+ end
29
+ headings = ['Table Name', 'Total Rows', 'Fetched Rows', 'Filter(s)']
30
+ tty = Terminal::Table.new headings: headings, rows: report
31
+ puts ''
32
+ puts 'Generating report...'
33
+ puts tty
34
+ end
35
+
24
36
  def self.apply_column_filters(table, source, source_db, target_db)
25
37
  columns = source.columns_for(source_db, table.name)
26
38
  overlapping_filters = Databender::Config.column_filters.keys & columns
@@ -36,8 +48,6 @@ module Databender
36
48
  exit(1)
37
49
  end
38
50
 
39
- report = []
40
-
41
51
  Databender::Config.load!(db_name)
42
52
  target_db = Databender::Config.target_db
43
53
  source_db = Databender::Config.source.database
@@ -53,11 +63,12 @@ module Databender
53
63
 
54
64
  ordered_tables = Databender::TableOrder.order_by_foreign_key_dependency(source, source_db, all_tables)
55
65
 
66
+ report_queries = []
67
+
56
68
  entries = ordered_tables.collect do |table|
57
69
  column_filters = apply_column_filters table, source, source_db, target_db
58
- sql = count_all_query source_db, table.name
59
- all_count = source.execute_count(sql).first.first
60
- sql, count_query, filter = if Databender::Config.table_filters.keys.include?(table.name)
70
+ all_count_sql = count_all_query source_db, table.name
71
+ sql, filter_count_sql, filter = if Databender::Config.table_filters.keys.include?(table.name)
61
72
  conditions = merge_filters(Databender::Config.table_filters[table.name], column_filters)
62
73
  [insert_into_select(source_db, table.name, conditions),
63
74
  count_filtered_query(source_db, table.name, conditions), conditions]
@@ -67,21 +78,15 @@ module Databender
67
78
  else
68
79
  parents = Databender::TableOrder.parent_tables_for(table.name)
69
80
  condition = parents.present? ? where_clause_by_reference(target_db, parents) : nil
70
- [insert_into_select(source_db, table.name, condition), count_filtered_query(source_db, table.name, condition), nil]
81
+ [insert_into_select(source_db, table.name, condition), count_filtered_query(source_db, table.name, condition), condition]
71
82
  end
72
83
  end
73
- subset_count = source.execute_count(count_query).count
74
- report << [table.name, all_count, subset_count, filter && filter]
84
+ report_queries << {table: table.name, all_count_sql: all_count_sql, filter_count_sql: filter_count_sql, filter: filter}
75
85
  {sql: sql, table: table.name}
76
86
  end
77
87
 
78
- headings = ['Table Name', 'Total Rows', 'Fetched Rows', 'Filter(s)']
79
- tty = Terminal::Table.new headings: headings, rows: report
80
- puts tty
81
-
82
-
83
88
  self.generate_script source: Databender::Config.source, target_db: target_db, entries: entries, database: db_name
84
-
89
+ [source, report_queries]
85
90
  end
86
91
  end
87
92
  end
@@ -1,8 +1,8 @@
1
- <%= options[:database] %>:
1
+ <%= options[:db_name] %>:
2
2
  adapter: mysql2
3
3
  host: 127.0.0.1
4
4
  encoding: utf8
5
- database: <%= options[:database] %>
5
+ database: <%= options[:db_name] %>
6
6
  username: root
7
7
  password:
8
8
  port: 3306
@@ -1,3 +1,3 @@
1
1
  module Databender
2
- VERSION = '0.0.1'
2
+ VERSION = '0.0.2'
3
3
  end
@@ -36,5 +36,5 @@ mkdir -p dumps
36
36
 
37
37
  MYSQL_PWD='{{source.password}}' mysqldump -u{{source.username}} -h{{source.host}} -P{{source.port}} {{target_db}} | gzip > dumps/{{database}}.sql.gz
38
38
 
39
- echo "\033[92mThe dump of db_subset is available at dumps/{{database}}.sql.gz. Use gunzip to extract followed by mysql command to load. The subset database is intact in the db server too!"
39
+ echo "\033[92mThe dump of db_subset is available at dumps/{{database}}.sql.gz. Use gunzip to extract followed by mysql command to load. The subset database is intact in the db server too!\033[0m"
40
40
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: databender
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - RC
@@ -166,10 +166,10 @@ files:
166
166
  - ".ruby-gemset"
167
167
  - ".ruby-version"
168
168
  - Gemfile
169
- - LICENSE
170
- - LICENSE.txt
169
+ - LICENSE.md
171
170
  - README.md
172
171
  - Rakefile
172
+ - assets/demo.gif
173
173
  - bin/databender
174
174
  - databender.gemspec
175
175
  - dumps/magic_list_service_test.sql.gz
@@ -186,7 +186,7 @@ files:
186
186
  - lib/databender/templates/filter.yml
187
187
  - lib/databender/version.rb
188
188
  - lib/subset.sh.mustache
189
- homepage: ''
189
+ homepage: https://github.com/rcdexta/databender
190
190
  licenses:
191
191
  - MIT
192
192
  metadata: {}
@@ -1,22 +0,0 @@
1
- Copyright (c) 2015 RC
2
-
3
- MIT License
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
12
-
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
15
-
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.