db_subsetter 0.4.1 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: abf5990b7e3fd33ac74ba829e56919a5d6cc84c1
4
- data.tar.gz: 3c83e352af29441d90675d8f1c924bdc80dfad14
2
+ SHA256:
3
+ metadata.gz: 45469d6d8b6183ae244553ee87165494cd0a5ecef1b843cd73f2be192a554c2c
4
+ data.tar.gz: 87f7ab413790695a3c5808817f8c502b793c028a7bfdbe6637ecf6c62831fb15
5
5
  SHA512:
6
- metadata.gz: 0fb729b7e0db5e1de5133eaf082fb3e6067f4aaaa370b303a07c49d5a18bd3355f97a3a1ab43c3c41b508d470c9a126abac969d311c649fac9b0aa6bac23278e
7
- data.tar.gz: f0637b18e81c8fb5414cac673b2f94d954bc8c10e1d7398ca8589c976069107ea22f5aff37d6aa7db7778f7173b76fc3056e6d3e0015931584c4f3d5d104ca0f
6
+ metadata.gz: a9c78f51a33be41e7c6f84a0e9bb35e48cb21041846c57eb3d01467f0a85664aa124f260661f73a0d95a73ea2a6ab729db1604e5371936a26f20410ebb2b3c63
7
+ data.tar.gz: b706e528ffb67f5e228440fbfb5eb774c73aa642787a21cf61105c382dfc2c82e38cd3431a55f325bf17602acdbfab0b3feefc97b98dad5d62d9d29db3da8dda
data/README.md CHANGED
@@ -1,5 +1,12 @@
1
+ <!-- vim: set nofoldenable: -->
1
2
  # db_subsetter
2
3
 
4
+ [![Build Status](https://travis-ci.org/lostapathy/db_subsetter.svg?branch=master)](https://travis-ci.org/lostapathy/db_subsetter)
5
+ [![Maintainability](https://api.codeclimate.com/v1/badges/26b61bf940b79bbfa529/maintainability)](https://codeclimate.com/github/lostapathy/db_subsetter/maintainability)
6
+ [![Test Coverage](https://codeclimate.com/github/lostapathy/db_subsetter/badges/coverage.svg)](https://codeclimate.com/github/lostapathy/db_subsetter/coverage)
7
+
8
+ ![db_subsetter logo](/logo/db_subsetter_logo.png?raw=true "db_subsetter logo")
9
+
3
10
  Extract a subset of a relational database for use in development or testing. Provides a simple API to filter rows and preserve referential integrity. The extracted data is packed into a [SQLite](https://www.sqlite.org/) database to allow easy copying.
4
11
 
5
12
  Developing against a realistic dataset extracted from production provides a lot of advantages over starting with an empty database. This tools was inspired by [rdbms-subsetter](https://github.com/18F/rdbms-subsetter) and [yaml_db](https://github.com/yamldb/yaml_db/) and combines some of the best attributes of both.
@@ -8,6 +15,21 @@ When working against a legacy database, automatic relationship management does n
8
15
 
9
16
  ActiveRecord is used for database access, however you *do not* need to have ActiveRecord models for all tables you wish to subset. Any database supported by ActiveRecord should work. In theory, you should be able to subset from database and import into another (i.e., MySQL -> Postgres), however in practice this may or may not work well depending on exactly what data types are used.
10
17
 
18
+ ## RDBMS Support
19
+
20
+ db_subsetter requires a small RDBMS-specific adapter in order to deal with a few things during the export/import process, mainly related to foreign keys. At present, the following dialects are supported. Writing others is pretty straightforward, PRs welcome.
21
+
22
+ * MySQL
23
+ * MS SQL
24
+ * Postgres
25
+ * Sqlite
26
+
27
+ ## Limitations
28
+
29
+ Over time we hope to remove some of these limitations. Until then, tables affected by these limitations can either be skipped or processed manually.
30
+
31
+ * Tables to be exported must have a single-column primary key unless they have less than SELECT_BATCH_SIZE (5000) rows
32
+ * Foreign keys that do not point back to a primary key are not automatically filtered on
11
33
 
12
34
  ## Installation
13
35
 
@@ -27,7 +49,53 @@ Or install it yourself as:
27
49
 
28
50
  ## Usage
29
51
 
30
- TODO: Write usage instructions here
52
+ db_subsetter is a toolset for creating export/import scripts to export and import your data. There is no command to run, rather, you build your own scripts. These instructions give an overview of how to build up a typical configuration to export a subset of data for typical development workflows, but should just be considered a starting point.
53
+
54
+ ### Prerequisites
55
+
56
+ The examples provided here assume you are using db_subsetter in the context of a Rails app and that ActiveRecord is already configured and "just works." This is just done for brevity in the example scripts, as db_subsetter absolutely does not require you to use Rails. Using Rails just makes some operations a little more convenient. If you aren't a Rails user, you'll need to add code (after the require statements) to connect ActiveRecord, such as:
57
+
58
+ ```ruby
59
+ ActiveRecord::Base.establish_connection(
60
+ adapter: "mysql2",
61
+ host: "127.0.0.1",
62
+ username: "dbuser",
63
+ database: "huge_db"
64
+ )
65
+ ```
66
+ ### A Minimal Start
67
+
68
+ We'll start our example with a minimal export.rb and build up from there. This
69
+
70
+ ```ruby
71
+ #!/usr/bin/env ruby
72
+ require 'db_subsetter'
73
+
74
+ exporter = DbSubsetter::Exporter.new
75
+ filename = "project-#{Rails.env}.sqlite3"
76
+ FileUtils.rm(filename) if File.exists?(filename)
77
+
78
+ exporter.export(filename)
79
+ ```
80
+ Time to run it against our db and see what happens!
81
+
82
+
83
+
84
+
85
+ TODO: These instructions are a work in progress. More to come!
86
+
87
+ ## Applications
88
+
89
+ The obvious application of db_subsetter is to provide a subset for development. There are many other non-obvious uses.
90
+
91
+ * Capture state when an exception occurs to ease in reproducing the problem
92
+ * Creating reproducible scenarios for complex integration tests
93
+ * Exporting the underlying data used to generate a report, for compliance and audit purposes
94
+ * Archival before deletion of data
95
+ * Providing customers with their own data
96
+ * Migration between RDBMS systems
97
+
98
+ Come up with something else? Please file an issue or submit a PR, we'd love to hear about it!
31
99
 
32
100
  ## Development
33
101
 
@@ -35,25 +103,29 @@ After checking out the repo, run `bin/setup` to install dependencies. Then, run
35
103
 
36
104
  To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
37
105
 
106
+ ## Roadmap
107
+
108
+ * 0.4.x (released) - fully functional, requires manual filtering of all tables
109
+ * 0.5.x (December 2017) - automating filtering of tables by foreign keys, requires much less configuration but will have small breaking API changes
110
+ * 0.6.x (TBA) - improve/expand the scrambler API to allow much simpler filtering of tables, breaking changes to scrambler API likely
111
+
38
112
  ## TODO
39
113
 
40
- * Improve the dialect handling
41
114
  * Better example docs on usage and filtering examples
42
- * Implement a scrubber API to allow sanitizing or correcting data at export time. This allows us to keep sensitive/personal data out of the export and also allows correction of broken data that won't re-insert.
115
+ * (0.6.0) Implement a scrubber API to allow sanitizing or correcting data at export time. This allows us to keep sensitive/personal data out of the export and also allows correction of broken data that won't re-insert.
43
116
  * Add an executable and/or rake task to perform export and import rather than requiring the API to used directly. Will need a config file to specific custom plugins
44
117
  * Add pre-flight check on import to make sure all tables smell like they will load the data (right columns, at minimum)
45
- * Finish building and test checks to make sure foreign keys are valid after import
46
- * Have verify_exportability return all failures together rather than one at a time
47
- * Add a verbose mode to display more detailed stats while running an export or import (what table we're on, records exported, time taken
118
+ * Examples of validating referential integrity after import
119
+ * Add a verbose mode to display more detailed stats while running an export or import (what table we're on, records exported, time taken)
120
+ * Decouple generating the subset from outputting it, so we could have alternate outputs - like sending direct to another db
121
+ * Provide an alternate API to allow filtering without dealing directly with Arel. Perhaps a method to pass in an array of IDs to filter from?
122
+ * (0.6.0) Add API calls to allow columns to be skipped completely when subsetting
48
123
 
49
124
  ## Contributing
50
125
 
51
126
  Bug reports and pull requests are welcome on GitHub at https://github.com/lostapathy/db_subsetter.
52
127
 
53
-
54
-
55
128
  ## License
56
129
 
57
-
58
130
  The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
59
131
 
data/lib/db_subsetter.rb CHANGED
@@ -1,13 +1,15 @@
1
- require "db_subsetter/version"
2
- require "db_subsetter/filter"
3
- require "db_subsetter/exporter"
4
- require "db_subsetter/importer"
5
- require "db_subsetter/scrambler"
6
- require "db_subsetter/dialect/generic"
7
- require "db_subsetter/dialect/my_sql"
8
- require "db_subsetter/dialect/ms_sql"
9
-
10
-
11
- module DbSubsetter
12
- # Your code goes here...
13
- end
1
+ require 'db_subsetter/version'
2
+ require 'db_subsetter/circular_relation_error'
3
+ require 'db_subsetter/filter'
4
+ require 'db_subsetter/database'
5
+ require 'db_subsetter/table'
6
+ require 'db_subsetter/exporter'
7
+ require 'db_subsetter/importer'
8
+ require 'db_subsetter/relation'
9
+ require 'db_subsetter/scrambler'
10
+ require 'db_subsetter/type_helper'
11
+ require 'db_subsetter/dialect/generic'
12
+ require 'db_subsetter/dialect/my_sql'
13
+ require 'db_subsetter/dialect/ms_sql'
14
+ require 'db_subsetter/dialect/postgres'
15
+ require 'db_subsetter/dialect/sqlite'
@@ -0,0 +1,4 @@
1
+ module DbSubsetter
2
+ class CircularRelationError < StandardError
3
+ end
4
+ end
@@ -0,0 +1,46 @@
1
+ module DbSubsetter
2
+ # A database to be exported from/to
3
+ class Database
4
+ def initialize(exporter)
5
+ @exporter = exporter
6
+ @tables = {}
7
+ all_table_names.each { |table_name| @tables[table_name] = Table.new(table_name, self, @exporter) }
8
+ end
9
+
10
+ def find_table(name)
11
+ @tables[name.to_s]
12
+ end
13
+
14
+ def tables
15
+ @tables.values
16
+ end
17
+
18
+ def exported_tables
19
+ tables.reject(&:ignored?)
20
+ end
21
+
22
+ # Raw list of names of all tables in the database.
23
+ def all_table_names
24
+ @all_table_names ||= ActiveRecord::Base.connection.tables - ['ar_internal_metadata']
25
+ end
26
+
27
+ # Used in debugging/reporting
28
+ def total_row_counts
29
+ tables.map { |table| [table.name, table.total_row_count] }.to_h
30
+ end
31
+
32
+ # Used in debugging/reporting
33
+ def filtered_row_counts
34
+ tables.map { |table| [table.name, table.filtered_row_count] }.to_h
35
+ end
36
+
37
+ def exportable?
38
+ puts "Verifying table exportability ...\n\n" if @exporter.verbose?
39
+ exported_tables.reject(&:exportable?).count.zero?
40
+ end
41
+
42
+ def exportability_issues
43
+ exported_tables.reject(&:exportable?).map { |table| [table.name, table.exportability_issues] }.to_h
44
+ end
45
+ end
46
+ end
@@ -1,14 +1,25 @@
1
1
  module DbSubsetter
2
2
  module Dialect
3
+ # Dialect to subset to/from database without explicit support
3
4
  class Generic
5
+ INSERT_BATCH_SIZE = 500
6
+
4
7
  def self.import
5
- yield
8
+ ActiveRecord::Base.connection.disable_referential_integrity do
9
+ yield
10
+ end
6
11
  end
7
12
 
8
13
  def self.integrity_problems
9
- []
14
+ raise NotImplementedError, 'integrity_problems not implemented for this dialect'
15
+ end
16
+
17
+ def self.truncate_table(table)
18
+ ActiveRecord::Base.connection.truncate(table)
19
+ rescue NotImplementedError
20
+ table = ActiveRecord::Base.connection.quote_table_name(table)
21
+ ActiveRecord::Base.connection.execute("DELETE FROM #{table}")
10
22
  end
11
23
  end
12
24
  end
13
25
  end
14
-
@@ -1,19 +1,8 @@
1
1
  module DbSubsetter
2
2
  module Dialect
3
+ # Dialect to subset to/from Microsoft SQL Server
3
4
  class MSSQL < Generic
4
- def self.import
5
- ActiveRecord::Base.connection.execute('EXEC sp_msforeachtable "ALTER TABLE ? NOCHECK CONSTRAINT all"')
6
- ActiveRecord::Base.connection.execute('EXEC sp_msforeachtable "ALTER TABLE ? DISABLE TRIGGER all"')
7
- ActiveRecord::Base.connection.execute("select 'ALTER INDEX ' + I.name + ' ON ' + T.name + ' DISABLE'
8
- from sys.indexes I
9
- inner join sys.tables T on I.object_id = T.object_id
10
- where I.type_desc = 'NONCLUSTERED'
11
- and I.name is not null")
12
-
13
- yield
14
- ActiveRecord::Base.connection.execute('EXEC sp_msforeachtable "ALTER TABLE ? ENABLE TRIGGER all"')
15
- ActiveRecord::Base.connection.execute('EXEC sp_msforeachtable "ALTER TABLE ? WITH CHECK CHECK CONSTRAINT all"')
16
- end
5
+ INSERT_BATCH_SIZE = 100
17
6
 
18
7
  def self.integrity_problems
19
8
  ActiveRecord::Base.connection.execute('EXEC sp_msforeachtable "DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS"')
@@ -21,4 +10,3 @@ module DbSubsetter
21
10
  end
22
11
  end
23
12
  end
24
-
@@ -1,16 +1,7 @@
1
1
  module DbSubsetter
2
2
  module Dialect
3
+ # Dialect to subset to/from MySQL
3
4
  class MySQL < Generic
4
- def self.import
5
- ActiveRecord::Base.connection.execute("SET FOREIGN_KEY_CHECKS=0;")
6
- yield
7
- ActiveRecord::Base.connection.execute("SET FOREIGN_KEY_CHECKS=1;")
8
- end
9
-
10
- def self.integrity_problems
11
- raise NotImplementedError.new("integrity_problems not implemented for MySQL")
12
- end
13
5
  end
14
6
  end
15
7
  end
16
-
@@ -0,0 +1,7 @@
1
+ module DbSubsetter
2
+ module Dialect
3
+ # Dialect to subset to/from postgres
4
+ class Postgres < Generic
5
+ end
6
+ end
7
+ end
@@ -0,0 +1,10 @@
1
+ module DbSubsetter
2
+ module Dialect
3
+ # Dialect to subset to/from sqlite
4
+ class Sqlite < Generic
5
+ def self.integrity_problems
6
+ ActiveRecord::Base.connection.execute('PRAGMA foreign_key_check')
7
+ end
8
+ end
9
+ end
10
+ end
@@ -2,154 +2,89 @@ require 'sqlite3'
2
2
  require 'active_record'
3
3
 
4
4
  module DbSubsetter
5
+ # Manages exporting a subset of data
5
6
  class Exporter
6
- attr_writer :filter, :max_unfiltered_rows, :max_filtered_rows
7
+ attr_writer :max_filtered_rows
8
+ attr_reader :scramblers, :output, :database
9
+ attr_accessor :filter, :verbose
10
+ alias verbose? verbose
7
11
 
8
- def all_tables
9
- ActiveRecord::Base.connection.tables
10
- end
11
-
12
- def tables
13
- filter.tables
14
- end
15
-
16
- def total_row_counts
17
- tables.each.map do |table|
18
- query = Arel::Table.new(table, ActiveRecord::Base).project("count(1) AS num_rows")
19
- rows = ActiveRecord::Base.connection.select_one(query.to_sql)["num_rows"]
20
- {table => rows}
21
- end
22
- end
23
-
24
- def filtered_row_counts
25
- tables.each.map do |table|
26
- {table => filtered_row_count(table)}
27
- end
28
- end
29
-
30
- def verify_exportability(verbose = true)
31
- puts "Verifying table exportability ...\n\n" if verbose
32
- errors = tables.map{|x| verify_table_exportability(x) }.flatten.compact
33
- if errors.count > 0
34
- puts errors.join("\n")
35
- raise ArgumentError.new "Some tables are not exportable"
12
+ # this is the batch size we insert into sqlite, which seems to be a reasonable balance of speed and memory usage
13
+ INSERT_BATCH_SIZE = 250
14
+ SELECT_BATCH_SIZE = 5000
15
+
16
+ def export(filename)
17
+ unless @database.exportable?
18
+ if verbose?
19
+ STDERR.puts "\nExportability issues:\n"
20
+ @database.exportability_issues.each do |table, issues|
21
+ STDERR.puts table
22
+ issues.each { |issue| STDERR.puts "\t#{issue}" }
23
+ end
24
+ end
25
+ raise ArgumentError, 'Database is not exportable as filtered!'
36
26
  end
37
- puts "\n\n" if verbose
38
- end
39
-
40
- def export(filename, verbose = true)
41
- @verbose = verbose
42
- verify_exportability(verbose)
43
27
 
44
28
  puts "Exporting data...\n\n" if @verbose
45
29
  @output = SQLite3::Database.new(filename)
46
- @output.execute("CREATE TABLE tables (name TEXT, records_exported INTEGER, columns TEXT)")
47
- tables.each do |table|
48
- export_table(table)
49
- end
30
+ @output.execute 'CREATE TABLE tables (name TEXT, records_exported INTEGER, columns TEXT)'
31
+ @database.exported_tables.each(&:export)
50
32
  end
51
33
 
52
34
  def add_scrambler(scrambler)
53
35
  @scramblers << scrambler
54
36
  end
55
37
 
38
+ def ignore_tables(ignored)
39
+ limit_tables('ignore!', ignored)
40
+ end
41
+
42
+ def subset_full_tables(full_tables)
43
+ limit_tables('subset_in_full!', full_tables)
44
+ end
45
+
56
46
  def initialize
57
47
  @scramblers = []
58
48
  @page_counts = {}
59
- end
60
-
61
- private
62
- def max_unfiltered_rows
63
- @max_unfiltered_rows || 1000
49
+ @database = Database.new(self)
50
+ @filter = Filter.new
51
+ @verbose = true
52
+ $stdout.sync
64
53
  end
65
54
 
66
55
  def max_filtered_rows
67
56
  @max_filtered_rows || 2000
68
57
  end
69
58
 
70
- # this is the batch size we insert into sqlite, which seems to be a reasonable balance of speed and memory usage
71
- def insert_batch_size
72
- 250
73
- end
74
-
75
- def select_batch_size
76
- insert_batch_size * 20
77
- end
78
-
79
- def filter
80
- @filter ||= Filter.new
81
- @filter.exporter = self
82
- @filter
83
- end
84
-
85
- def filtered_row_count(table)
86
- query = Arel::Table.new(table, ActiveRecord::Base)
87
- query = filter.filter(table, query).project( Arel.sql("count(1)") )
88
- ActiveRecord::Base.connection.select_one(query.to_sql).values.first
89
- end
90
-
91
- def pages(table)
92
- @page_counts[table] ||= ( filtered_row_count(table) / select_batch_size.to_f ).ceil
93
- end
94
-
95
- def order_by(table)
96
- #TODO should probably allow the user to override this and manually set a sort order?
97
- key = ActiveRecord::Base.connection.primary_key(table)
98
- key || false
59
+ # FIXME: look at this API, passing a table name back seems wrong
60
+ def sanitize_row(table_name, row)
61
+ row = TypeHelper.cleanup_types(row)
62
+ scramble_row(table_name, row)
99
63
  end
100
64
 
101
- def verify_table_exportability(table)
102
- puts "Verifying: #{table}" if @verbose
103
- errors = []
104
- errors << "ERROR: Multiple pages but no primary key on: #{table}" if pages(table) > 1 && order_by(table).blank?
105
- errors << "ERROR: Too many rows in: #{table} (#{filtered_row_count(table)})" if( filtered_row_count(table) > max_filtered_rows )
106
- errors
107
- end
108
-
109
- def cleanup_types(row)
110
- row.map do |field|
111
- case field
112
- when Date, Time then field.to_s(:db)
113
- else
114
- field
115
- end
116
- end
117
- end
65
+ private
118
66
 
119
- def scramble_data(table, data)
120
- @scramblers.each do |scrambler|
121
- data = scrambler.scramble(table, data)
67
+ def scramble_row(table_name, row)
68
+ scramblers.each do |scrambler|
69
+ row = scrambler.scramble(table_name, row)
122
70
  end
123
- data
71
+ row
124
72
  end
125
73
 
126
- def export_table(table)
127
- print "Exporting: #{table} (#{pages(table)} pages)" if @verbose
128
- $stdout.flush if @verbose
129
- columns = ActiveRecord::Base.connection.columns(table).map{ |table| table.name }
130
- rows_exported = 0
131
- @output.execute("CREATE TABLE #{table.underscore} ( data TEXT )")
132
- for i in 0..(pages(table) - 1)
133
- arel_table = query = Arel::Table.new(table, ActiveRecord::Base)
134
- query = filter.filter(table, query)
135
- # Need to extend this to take more than the first batch_size records
136
- query = query.order(arel_table[order_by(table)]) if order_by(table)
137
-
138
-
139
- query = query.skip(i * select_batch_size).take(select_batch_size) if pages(table) > 1
140
- sql = query.project( Arel.sql('*') ).to_sql
141
-
142
- records = ActiveRecord::Base.connection.select_rows( sql )
143
- records.each_slice(insert_batch_size) do |rows|
144
- @output.execute("INSERT INTO #{table.underscore} (data) VALUES #{ Array.new(rows.size){"(?)"}.join(",")}", rows.map{|x| scramble_data(table, cleanup_types(x))}.map(&:to_json) )
145
- rows_exported += rows.size
74
+ def limit_tables(operation, apply_to)
75
+ if apply_to.is_a?(Array)
76
+ apply_to.each do |t|
77
+ @database.find_table(t).send(operation)
146
78
  end
147
- print "." if @verbose
148
- $stdout.flush if @verbose
79
+ elsif apply_to.is_a?(Symbol) || apply_to.is_a?(String)
80
+ @database.find_table(apply_to).send(operation)
81
+ elsif apply_to.is_a?(Regexp)
82
+ @database.tables.each do |table|
83
+ table.send(operation) if table.name =~ apply_to
84
+ end
85
+ else
86
+ raise ArgumentError, "Don't know how to #{operation} a #{apply_to.class}"
149
87
  end
150
- puts "" if @verbose
151
- @output.execute("INSERT INTO tables VALUES (?, ?, ?)", [table, rows_exported, columns.to_json])
152
88
  end
153
89
  end
154
90
  end
155
-
@@ -1,26 +1,16 @@
1
1
  require 'active_record'
2
2
 
3
3
  module DbSubsetter
4
+ # Base class for defining a custom filter for defining how to create a subset
5
+ # of your database
4
6
  class Filter
5
- attr_writer :exporter
6
-
7
- def ignore_tables
8
- []
9
- end
10
-
11
- def tables
12
- @exporter.all_tables - ActiveRecord::SchemaDumper.ignore_tables - ignore_tables
13
- end
14
-
15
- def filter(table, query)
16
- filter_method = "filter_#{table.downcase}"
17
- if self.respond_to? filter_method
18
- self.send(filter_method, query)
7
+ def apply(table, query)
8
+ filter_method = "filter_#{table.name.downcase}"
9
+ if respond_to? filter_method
10
+ send(filter_method, query)
19
11
  else
20
12
  query
21
13
  end
22
14
  end
23
-
24
15
  end
25
16
  end
26
-
@@ -1,24 +1,36 @@
1
1
  require 'sqlite3'
2
2
 
3
3
  module DbSubsetter
4
+ # Manages importing a subset of data
4
5
  class Importer
5
-
6
- def initialize(filename, dialect = DbSubsetter::Dialect::Generic)
7
- raise ArgumentError.new("invalid input file") unless File.exists?(filename)
6
+ def initialize(filename)
7
+ raise ArgumentError, 'invalid input file' unless File.exist?(filename)
8
8
 
9
9
  @data = SQLite3::Database.new(filename)
10
- @dialect = dialect
10
+ @dialect = case ActiveRecord::Base.connection_config[:adapter]
11
+ when 'mysql2'
12
+ DbSubsetter::Dialect::MySQL
13
+ when 'postgresql'
14
+ DbSubsetter::Dialect::Postgres
15
+ when 'sqlite3'
16
+ DbSubsetter::Dialect::Sqlite
17
+ when 'sqlserver'
18
+ DbSubsetter::Dialect::MSSQL
19
+ else
20
+ DbSubsetter::Dialect::Generic
21
+ end
11
22
  end
12
23
 
13
24
  def tables
14
25
  all_tables = []
15
- @data.execute("SELECT name FROM tables") do |row|
26
+ @data.execute('SELECT name FROM tables') do |row|
16
27
  all_tables << row[0]
17
28
  end
18
29
  all_tables
19
30
  end
20
31
 
21
- def import
32
+ def import(verbose = true)
33
+ @verbose = verbose
22
34
  @dialect.import do
23
35
  tables.each do |table|
24
36
  import_table(table)
@@ -26,39 +38,37 @@ module DbSubsetter
26
38
  end
27
39
  end
28
40
 
29
- def insert_batch_size
30
- 100 # more like 500 for mysql
31
- end
32
-
33
41
  private
42
+
34
43
  def import_table(table)
35
- begin
36
- ActiveRecord::Base.connection.truncate(table)
37
- rescue NotImplementedError
38
- ActiveRecord::Base.connection.execute("DELETE FROM #{quoted_table_name(table)}")
39
- end
44
+ $stdout.sync
45
+ started_at = Time.now
46
+ print "Importing #{table}" if @verbose
47
+ @dialect.truncate_table(table)
40
48
 
41
49
  ActiveRecord::Base.connection.begin_db_transaction
42
50
 
43
51
  all_rows = @data.execute("SELECT data FROM #{table.underscore}")
44
- all_rows.each_slice(insert_batch_size) do |rows|
45
- quoted_rows = rows.map{ |row| "(" + quoted_values(row).join(",") + ")" }.join(",")
46
- insert_sql = "INSERT INTO #{quoted_table_name(table)} (#{quoted_column_names(table).join(",")}) VALUES #{quoted_rows}"
52
+ all_rows.each_slice(@dialect::INSERT_BATCH_SIZE) do |rows|
53
+ quoted_rows = rows.map { |row| '(' + quoted_values(row).join(',') + ')' }.join(',')
54
+ insert_sql = "INSERT INTO #{quoted_table_name(table)} (#{quoted_column_names(table).join(',')}) VALUES #{quoted_rows}"
47
55
  ActiveRecord::Base.connection.execute(insert_sql)
56
+ print '.' if @verbose
48
57
  end
49
58
 
50
- ActiveRecord::Base.connection.commit_db_transaction
59
+ ActiveRecord::Base.connection.commit_db_transaction
60
+ puts " (#{(Time.now - started_at).round(3)}s)" if @verbose
51
61
  end
52
62
 
53
63
  def quoted_values(row)
54
64
  out = JSON.parse(row[0])
55
- out = out.map{|x| ActiveRecord::Base.connection.type_cast(x, nil) }
56
- out = out.map{|x| ActiveRecord::Base.connection.quote(x) }
65
+ out = out.map { |x| ActiveRecord::Base.connection.type_cast(x, nil) }
66
+ out = out.map { |x| ActiveRecord::Base.connection.quote(x) }
57
67
  out
58
68
  end
59
69
 
60
70
  def columns(table)
61
- raw = @data.execute("SELECT columns FROM tables WHERE name = ?", [table]).first[0]
71
+ raw = @data.execute('SELECT columns FROM tables WHERE name = ?', [table]).first[0]
62
72
  JSON.parse(raw)
63
73
  end
64
74
 
@@ -67,9 +77,7 @@ module DbSubsetter
67
77
  end
68
78
 
69
79
  def quoted_column_names(table)
70
- columns(table).map{ |column| ActiveRecord::Base.connection.quote_column_name(column) }
80
+ columns(table).map { |column| ActiveRecord::Base.connection.quote_column_name(column) }
71
81
  end
72
-
73
82
  end
74
83
  end
75
-
@@ -0,0 +1,34 @@
1
+ module DbSubsetter
2
+ # Wraps a foreign key relationship between two tables
3
+ class Relation
4
+ attr_reader :to_table, :column
5
+
6
+ def initialize(ar_association, database)
7
+ @column = ar_association.column
8
+ @other_column = ar_association.primary_key
9
+ @to_table = database.find_table ar_association.to_table
10
+ @from_table = database.find_table ar_association.from_table
11
+ end
12
+
13
+ # We cannot subset automatically if the relation points to a non-primary key
14
+ def can_subset_from?
15
+ @to_table.primary_key == @other_column
16
+ end
17
+
18
+ def apply_subset(query)
19
+ return query if !can_subset_from? || @to_table.subset_in_full?
20
+
21
+ # If the other table is ignored, we must not include any records that reference it
22
+ query = query.where(arel_table[@column].neq(nil)) if @to_table.ignored?
23
+
24
+ # If a related table will be exported in full, don't bother subsetting on that key
25
+ unless @to_table.subset_in_full?
26
+ other_ids = @to_table.filtered_ids
27
+ arel_table = @from_table.arel_table
28
+ conditions = arel_table[@column].in(other_ids).or(arel_table[@column].eq(nil))
29
+ query = query.where(conditions)
30
+ end
31
+ query
32
+ end
33
+ end
34
+ end
@@ -1,11 +1,12 @@
1
1
  require 'random-word'
2
2
 
3
3
  module DbSubsetter
4
+ # Clean or redact data to be exported
4
5
  class Scrambler
5
6
  def scramble(table, row)
6
7
  scramble_method = "scramble_#{table.downcase}"
7
- if self.respond_to? scramble_method
8
- self.send(scramble_method, row)
8
+ if respond_to? scramble_method
9
+ send(scramble_method, row)
9
10
  else
10
11
  row
11
12
  end
@@ -16,13 +17,16 @@ module DbSubsetter
16
17
  end
17
18
 
18
19
  protected
20
+
19
21
  def scramble_column(table, column, row_data, value)
20
22
  row_data[column_index(table, column)] = value
21
23
  end
22
24
 
23
25
  private
26
+
24
27
  def column_index(table, column)
25
- @column_index_cache["#{table}##{column}"] ||= ActiveRecord::Base.connection.columns(table).map{|table| table.name}.index(column.to_s)
28
+ @column_index_cache["#{table}##{column}"] ||=
29
+ ActiveRecord::Base.connection.columns(table).map.(&:name).index(column.to_s)
26
30
  end
27
31
  end
28
32
  end
@@ -0,0 +1,138 @@
1
+ module DbSubsetter
2
+ # A database in the database to be subset or imported
3
+ class Table
4
+ attr_accessor :name
5
+
6
+ def initialize(name, database, exporter)
7
+ @name = name
8
+ @exporter = exporter
9
+ @database = database
10
+ @full_table = @ignored = false
11
+ end
12
+
13
+ # FIXME: these 4 methods don't feel quite like the correct API yet
14
+ def ignore!
15
+ @ignored = true
16
+ end
17
+
18
+ def subset_in_full!
19
+ @subset_in_full = true
20
+ end
21
+
22
+ def subset_in_full?
23
+ @subset_in_full
24
+ end
25
+
26
+ def ignored?
27
+ @ignored
28
+ end
29
+
30
+ def total_row_count
31
+ query = arel_table.project('count(1) AS num_rows')
32
+ ActiveRecord::Base.connection.select_one(query.to_sql)['num_rows'].to_i # rails-4.2+pg needs to_i
33
+ end
34
+
35
+ def filtered_row_count
36
+ query = filtered_records.project(Arel.sql('count(1) AS num_rows'))
37
+ ActiveRecord::Base.connection.select_one(query.to_sql)['num_rows'].to_i # rails-4.2+pg needs to_i
38
+ end
39
+
40
+ # FIXME: move the raw SQL into another class
41
+ def export
42
+ print "Exporting: #{@name} (#{pages} pages)" if verbose
43
+
44
+ rows_exported = 0
45
+ @exporter.output.execute("CREATE TABLE #{@name.underscore} ( data TEXT )")
46
+ 0.upto(pages - 1).each do |page|
47
+ records_for_page(page).each_slice(Exporter::INSERT_BATCH_SIZE) do |rows|
48
+ data = rows.map { |x| @exporter.sanitize_row(@name, x) }.map(&:to_json)
49
+
50
+ @exporter.output.execute("INSERT INTO #{@name.underscore} (data) VALUES #{Array.new(rows.size) { '(?)' }.join(',')}", data)
51
+ rows_exported += rows.size
52
+ end
53
+
54
+ print '.' if verbose
55
+ end
56
+ puts '' if verbose
57
+ columns = ActiveRecord::Base.connection.columns(@name).map(&:name)
58
+ @exporter.output.execute('INSERT INTO tables VALUES (?, ?, ?)', [@name, rows_exported, columns.to_json])
59
+ end
60
+
61
+ def exportable?
62
+ exportability_issues.empty?
63
+ end
64
+
65
+ def exportability_issues
66
+ return @exportability_issues if @exportability_issues
67
+
68
+ @exportability_issues = []
69
+ begin
70
+ puts "Verifying: #{@name} (#{filtered_row_count}/#{total_row_count})" if verbose
71
+ @exportability_issues << 'Multiple pages but no primary key' if pages > 1 && primary_key.blank?
72
+ @exportability_issues << "Too many rows (#{filtered_row_count})" if filtered_row_count > @exporter.max_filtered_rows
73
+ rescue CircularRelationError
74
+ @exportability_issues << 'Circular relations through this table'
75
+ end
76
+ @exportability_issues
77
+ end
78
+
79
+ def filtered_ids
80
+ return @id_cache if @id_cache
81
+
82
+ raise CircularRelationError if @loaded_ids
83
+ @loaded_ids = true
84
+
85
+ sql = filtered_records.project(:id).to_sql
86
+
87
+ @id_cache = ActiveRecord::Base.connection.select_rows(sql).flatten
88
+ end
89
+
90
+ def arel_table
91
+ @arel_table ||= Arel::Table.new(@name)
92
+ end
93
+
94
+ def primary_key
95
+ ActiveRecord::Base.connection.primary_key(@name)
96
+ end
97
+
98
+ def relations
99
+ ActiveRecord::Base.connection.foreign_keys(@name).map { |x| Relation.new(x, @database) }
100
+ end
101
+
102
+ private
103
+
104
+ def verbose
105
+ @exporter.verbose?
106
+ end
107
+
108
+ def filtered_records
109
+ return arel_table if @exporter.nil? || @exporter.filter.nil?
110
+ query = @exporter.filter.apply(self, arel_table)
111
+
112
+ if total_row_count > @exporter.max_filtered_rows
113
+ query = filter_foreign_keys(query)
114
+ end
115
+ query
116
+ end
117
+
118
+ def filter_foreign_keys(query)
119
+ relations.each do |relation|
120
+ query = relation.apply_subset(query)
121
+ end
122
+ query
123
+ end
124
+
125
+ def records_for_page(page)
126
+ query = filtered_records
127
+ query = query.order(arel_table[primary_key]) if primary_key
128
+
129
+ query = query.skip(page * Exporter::SELECT_BATCH_SIZE).take(Exporter::SELECT_BATCH_SIZE) if pages > 1
130
+ sql = query.project(Arel.sql('*')).to_sql
131
+ ActiveRecord::Base.connection.select_rows(sql)
132
+ end
133
+
134
+ def pages
135
+ @page_count ||= (filtered_row_count / Exporter::SELECT_BATCH_SIZE.to_f).ceil
136
+ end
137
+ end
138
+ end
@@ -0,0 +1,15 @@
1
+ module DbSubsetter
2
+ # Utility module to help safely serialize types
3
+ # FIXME: nothing about this seems named correctly
4
+ module TypeHelper
5
+ def self.cleanup_types(row)
6
+ row.map do |field|
7
+ case field
8
+ when Date, Time then field.to_s(:db)
9
+ else
10
+ field
11
+ end
12
+ end
13
+ end
14
+ end
15
+ end
@@ -1,3 +1,3 @@
1
1
  module DbSubsetter
2
- VERSION = "0.4.1"
2
+ VERSION = '0.5.0'.freeze
3
3
  end
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: db_subsetter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Joe Francis
8
8
  autorequire:
9
- bindir: exe
9
+ bindir: bin
10
10
  cert_chain: []
11
- date: 2016-10-21 00:00:00.000000000 Z
11
+ date: 2017-12-10 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: appraisal
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: bundler
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -25,49 +39,77 @@ dependencies:
25
39
  - !ruby/object:Gem::Version
26
40
  version: '1.12'
27
41
  - !ruby/object:Gem::Dependency
28
- name: rake
42
+ name: minitest
29
43
  requirement: !ruby/object:Gem::Requirement
30
44
  requirements:
31
45
  - - "~>"
32
46
  - !ruby/object:Gem::Version
33
- version: '10.0'
47
+ version: '5.0'
34
48
  type: :development
35
49
  prerelease: false
36
50
  version_requirements: !ruby/object:Gem::Requirement
37
51
  requirements:
38
52
  - - "~>"
39
53
  - !ruby/object:Gem::Version
40
- version: '10.0'
54
+ version: '5.0'
41
55
  - !ruby/object:Gem::Dependency
42
- name: minitest
56
+ name: mysql2
43
57
  requirement: !ruby/object:Gem::Requirement
44
58
  requirements:
45
59
  - - "~>"
46
60
  - !ruby/object:Gem::Version
47
- version: '5.0'
61
+ version: 0.4.10
48
62
  type: :development
49
63
  prerelease: false
50
64
  version_requirements: !ruby/object:Gem::Requirement
51
65
  requirements:
52
66
  - - "~>"
53
67
  - !ruby/object:Gem::Version
54
- version: '5.0'
68
+ version: 0.4.10
55
69
  - !ruby/object:Gem::Dependency
56
- name: activerecord
70
+ name: pg
57
71
  requirement: !ruby/object:Gem::Requirement
58
72
  requirements:
59
73
  - - "~>"
60
74
  - !ruby/object:Gem::Version
61
- version: '4.2'
62
- type: :runtime
75
+ version: 0.21.0
76
+ type: :development
63
77
  prerelease: false
64
78
  version_requirements: !ruby/object:Gem::Requirement
65
79
  requirements:
66
80
  - - "~>"
67
81
  - !ruby/object:Gem::Version
68
- version: '4.2'
82
+ version: 0.21.0
69
83
  - !ruby/object:Gem::Dependency
70
- name: sqlite3
84
+ name: rake
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '10.0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '10.0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: activerecord
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: 4.2.6
104
+ type: :runtime
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: 4.2.6
111
+ - !ruby/object:Gem::Dependency
112
+ name: random-word
71
113
  requirement: !ruby/object:Gem::Requirement
72
114
  requirements:
73
115
  - - "~>"
@@ -81,7 +123,7 @@ dependencies:
81
123
  - !ruby/object:Gem::Version
82
124
  version: '1.3'
83
125
  - !ruby/object:Gem::Dependency
84
- name: random-word
126
+ name: sqlite3
85
127
  requirement: !ruby/object:Gem::Requirement
86
128
  requirements:
87
129
  - - "~>"
@@ -101,23 +143,23 @@ executables: []
101
143
  extensions: []
102
144
  extra_rdoc_files: []
103
145
  files:
104
- - ".gitignore"
105
- - ".travis.yml"
106
- - Gemfile
107
146
  - LICENSE
108
147
  - README.md
109
- - Rakefile
110
- - bin/console
111
- - bin/setup
112
- - db_subsetter.gemspec
113
148
  - lib/db_subsetter.rb
149
+ - lib/db_subsetter/circular_relation_error.rb
150
+ - lib/db_subsetter/database.rb
114
151
  - lib/db_subsetter/dialect/generic.rb
115
152
  - lib/db_subsetter/dialect/ms_sql.rb
116
153
  - lib/db_subsetter/dialect/my_sql.rb
154
+ - lib/db_subsetter/dialect/postgres.rb
155
+ - lib/db_subsetter/dialect/sqlite.rb
117
156
  - lib/db_subsetter/exporter.rb
118
157
  - lib/db_subsetter/filter.rb
119
158
  - lib/db_subsetter/importer.rb
159
+ - lib/db_subsetter/relation.rb
120
160
  - lib/db_subsetter/scrambler.rb
161
+ - lib/db_subsetter/table.rb
162
+ - lib/db_subsetter/type_helper.rb
121
163
  - lib/db_subsetter/version.rb
122
164
  homepage: https://github.com/lostapathy/db_subsetter
123
165
  licenses:
@@ -139,10 +181,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
139
181
  version: '0'
140
182
  requirements: []
141
183
  rubyforge_project:
142
- rubygems_version: 2.5.1
184
+ rubygems_version: 2.7.3
143
185
  signing_key:
144
186
  specification_version: 4
145
187
  summary: Extract a subset of a relational database for use in development or testing. Provides
146
188
  a simple API to filter rows and preserve referential integrity.
147
189
  test_files: []
148
- has_rdoc:
data/.gitignore DELETED
@@ -1,9 +0,0 @@
1
- /.bundle/
2
- /.yardoc
3
- /Gemfile.lock
4
- /_yardoc/
5
- /coverage/
6
- /doc/
7
- /pkg/
8
- /spec/reports/
9
- /tmp/
data/.travis.yml DELETED
@@ -1,5 +0,0 @@
1
- sudo: false
2
- language: ruby
3
- rvm:
4
- - 2.3.1
5
- before_install: gem install bundler -v 1.12.3
data/Gemfile DELETED
@@ -1,4 +0,0 @@
1
- source 'https://rubygems.org'
2
-
3
- # Specify your gem's dependencies in db_subsetter.gemspec
4
- gemspec
data/Rakefile DELETED
@@ -1,10 +0,0 @@
1
- require "bundler/gem_tasks"
2
- require "rake/testtask"
3
-
4
- Rake::TestTask.new(:test) do |t|
5
- t.libs << "test"
6
- t.libs << "lib"
7
- t.test_files = FileList['test/**/*_test.rb']
8
- end
9
-
10
- task :default => :test
data/bin/console DELETED
@@ -1,14 +0,0 @@
1
- #!/usr/bin/env ruby
2
-
3
- require "bundler/setup"
4
- require "db_subsetter"
5
-
6
- # You can add fixtures and/or initialization code here to make experimenting
7
- # with your gem easier. You can also use a different console, if you like.
8
-
9
- # (If you use this, don't forget to add pry to your Gemfile!)
10
- # require "pry"
11
- # Pry.start
12
-
13
- require "irb"
14
- IRB.start
data/bin/setup DELETED
@@ -1,8 +0,0 @@
1
- #!/usr/bin/env bash
2
- set -euo pipefail
3
- IFS=$'\n\t'
4
- set -vx
5
-
6
- bundle install
7
-
8
- # Do any other automated setup that you need to do here
data/db_subsetter.gemspec DELETED
@@ -1,32 +0,0 @@
1
- # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
3
- $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
- require 'db_subsetter/version'
5
-
6
- Gem::Specification.new do |spec|
7
- spec.name = "db_subsetter"
8
- spec.version = DbSubsetter::VERSION
9
- spec.authors = ["Joe Francis"]
10
- spec.email = ["joe@lostapathy.com"]
11
-
12
- spec.summary = %q{Extract a subset of a relational database for use in development or
13
- testing. Provides a simple API to filter rows and preserve referential
14
- integrity.}
15
- #spec.description = %q{TODO: Write a longer description or delete this line.}
16
- spec.homepage = "https://github.com/lostapathy/db_subsetter"
17
- spec.license = "MIT"
18
-
19
- spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
20
- spec.bindir = "exe"
21
- spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
- spec.require_paths = ["lib"]
23
-
24
- spec.add_development_dependency "bundler", "~> 1.12"
25
- spec.add_development_dependency "rake", "~> 10.0"
26
- spec.add_development_dependency "minitest", "~> 5.0"
27
-
28
- spec.add_dependency "activerecord", "~> 4.2"
29
- spec.add_dependency "sqlite3", "~> 1.3"
30
- spec.add_dependency "random-word", "~> 1.3"
31
- end
32
-