incsv 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 349fb4611684a002bf078566b5d115d826f4e8a9
4
- data.tar.gz: 2f2551c05e9a5adf601d1fb5bbd272b956e37140
3
+ metadata.gz: 2a64c9c204b84b53e240994132a99746b64a3d8a
4
+ data.tar.gz: e302031882b7e9b9aac37108e79025c593062c31
5
5
  SHA512:
6
- metadata.gz: 73ad7b8f885e8068898fa8ef8ff80947e2221a5f367c2dad9a948ffb7b7a26a35ab5c5cd47ff670ffedefaa16c47b16b364cf8f8b63ab127c387d01484fb10c3
7
- data.tar.gz: 689a7012aa844f8f3b8d04a0385b1554ac226b5a92a9de4ba8d28d4398e74546ff1b68a64faaef7ef0ed34a08d55bc7ee3c00f5c449a9de71408fc7d2e2f45b9
6
+ metadata.gz: 22d2a9fb3bcfc0206b96af378d5b7dffdc0b79501c1eac19b9482451c691ca1e9bbdee925b22971e68bedf4cc33c64b4d2f26c46e13d089fe48f566100accb50
7
+ data.tar.gz: 270ca3f95c76700bc24f410ec90b39141d40fb683d32557b1fc0b72e5a99853fb22dc7fe5a81db335087e966d3c85dac8870a72325f34cbf01ad966d68f9dfcf
data/.gitignore CHANGED
@@ -7,3 +7,4 @@
7
7
  /pkg/
8
8
  /spec/reports/
9
9
  /tmp/
10
+ *.db
data/README.md CHANGED
@@ -18,7 +18,128 @@ incsv can be installed via RubyGems:
18
18
 
19
19
  ## Usage
20
20
 
21
- TBC.
21
+ ### The quick version
22
+
23
+ The following command will drop you into a [REPL][] prompt:
24
+
25
+ $ incsv console path/to/file.csv
26
+
27
+ A Sequel connection to the database is stored in a variable called
28
+ `@db`. The name of the table is based on the filename of the CSV; so, if
29
+ your CSV file is called `products.csv`, then data will be imported into
30
+ a database table called `products`.
31
+
32
+ A quick example:
33
+
34
+ > @db[:products].select(:name).reverse_order(:price).take(5)
35
+ => [{:name=>"Makeshift battery"},
36
+ {:name=>"clothing iron"},
37
+ {:name=>"toy alien"},
38
+ {:name=>"enhanced targeting card"},
39
+ {:name=>"Giddyup Buttercup"}]
40
+
41
+ [repl]: https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop
42
+
43
+ ### The less-quick version
44
+
45
+ To use incsv, you essentially just need to point it at a CSV file. It’ll
46
+ then take care of parsing the CSV, figuring out the nature of the data
47
+ within it, creating a database and a table, and importing the data.
48
+
49
+ To perform all of these steps and be given an interactive console once
50
+ they’re done, you can use the `console` command.
51
+
52
+ Let’s imagine we have a CSV file that contains some product information:
53
+
54
+ $ head -3 products.csv
55
+ name,date_added,price
56
+ "Acid",2013-03-24,£38
57
+ "Abraxo cleaner",2016-09-25,£21
58
+
59
+ Here we can see that we have three columns: the product name, which is
60
+ just a string; the date the product was added, which is an
61
+ ISO-8601–formatted date; and the price, which is a currency value in
62
+ dollars.
63
+
64
+ In my sample data there are 515 products (plus a header row):
65
+
66
+ $ wc -l products.csv
67
+ 516
68
+
69
+ In order to query this data, we can pass the CSV file to incsv:
70
+
71
+ $ incsv console products.csv
72
+ Found database at products.db
73
+ Connection is in @db
74
+
75
+ Primary table name is products
76
+ Columns: _incsv_id, name, date_added, price
77
+
78
+ First row:
79
+ _incsv_id, name, date_added, price
80
+ 1, Acid, 2013-03-24, 0.38E2
81
+
82
+ Not sure what to do next? Try this:
83
+ @db[:products].count
84
+ >
85
+
86
+ It tells us some information about the file, and about the assumptions
87
+ it has made about the file. We can see that it’s imported the contents
88
+ of the file into a table called `products`, and that it’s used the
89
+ column names from the CSV to name the columns in the database table.
90
+
91
+ It also shows us the first row, where you might have noticed that the
92
+ price is in a slightly odd representation. That’s because incsv will
93
+ look at what type of data seems to be stored in your CSV before
94
+ importing it. In this case, it knows that the `date_added` column
95
+ contains a date, and that the `price` column contains a currency value.
96
+ In the former case, that means converting it into an actual SQL date. In
97
+ the latter case, this means converting it to `BigDecimal` format (and
98
+ storing it in the database as `DECIMAL(10, 2)`, so that we don’t either
99
+ lose any precision by storing the value as a float, or lose the ability
100
+ to do numerical calculations by storing it as a string.
101
+
102
+ It then suggests a query for us to run, which might generally be the
103
+ first thing that you’d want to know about the dataset: how many values
104
+ are there? We can run it and see:
105
+
106
+ > @db[:products].count
107
+ => 515
108
+
109
+ Excellent! It’s imported every one of the products that were in the CSV.
110
+
111
+ From this point on we can do any kind of analysis of the data that we
112
+ like; we have all the power of SQLite and Sequel at our fingertips. For
113
+ example, to get the number of products added each year:
114
+
115
+ > @db[:products].group_and_count{strftime("%Y", date_added).as(year)}.all
116
+ => [{:year=>"2013", :count=>132}, {:year=>"2014", :count=>123}, {:year=>"2015", :count=>131}, {:year=>"2016", :count=>129}]
117
+
118
+ Or to get the total value of products added today:
119
+
120
+ > @db[:products].select{sum(price).as(total_cost)}.where(date_added: Date.today).first
121
+ => {:total_cost=>40}
122
+
123
+ We can also do processing in Ruby, if there’s anything that’s difficult
124
+ in pure SQL. Imagine wanting to convert the product names to
125
+ URL-friendly “slugs”. This is pretty easy in Ruby. Let’s try it out on
126
+ the top 10 most expensive products:
127
+
128
+ > @db[:products].select(:name).reverse_order(:price).limit(10).each do |product|
129
+ * puts product[:name].gsub(/\s/, "-").squeeze("-").downcase.gsub(/[^a-z0-9\-]/, "")
130
+ * end
131
+ makeshift-battery
132
+ clothing-iron
133
+ toy-alien
134
+ enhanced-targeting-card
135
+ giddyup-buttercup
136
+ mole-rat-teeth
137
+ empty-teal-rounded-vase
138
+ pre-war-money
139
+ bowling-ball
140
+ toothbrush
141
+
142
+ Hopefully this illustrates what you can do with incsv!
22
143
 
23
144
  ## Development
24
145
 
data/exe/incsv ADDED
@@ -0,0 +1,94 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
4
+
5
+ require "thor"
6
+ require "pry"
7
+
8
+ require "incsv"
9
+
10
+ module InCSV
11
+ class Console
12
+ def initialize(db)
13
+ @db = db
14
+ end
15
+
16
+ def get_binding
17
+ binding
18
+ end
19
+ end
20
+
21
+ class CLI < Thor
22
+ desc "create CSV_FILE", "Creates a database file with the appropriate schema for the given CSV file, but doesn't import any data."
23
+ method_option :force, type: :boolean, default: false
24
+ def create(csv_file)
25
+ database = Database.new(csv_file)
26
+
27
+ if database.exists? && database.table_created? && !options.force?
28
+ $stderr.puts "Database already exists."
29
+ exit 41
30
+ end
31
+
32
+ database.create_table
33
+ puts "Database created successfully in #{database.db_path}"
34
+ rescue StandardError => e
35
+ $stderr.puts "Database failed to create."
36
+ $stderr.puts "#{e.message}"
37
+ exit 40
38
+ end
39
+
40
+ desc "import CSV_FILE", "Creates a database file with the appropriate schema for the given CSV file, and then imports the data within the file."
41
+
42
+ method_option :force, type: :boolean, default: false
43
+ def import(csv_file)
44
+ database = Database.new(csv_file)
45
+ create(csv_file)
46
+ database.import
47
+
48
+ puts "Data imported."
49
+ puts
50
+ puts "Command to query:"
51
+ puts "$ sqlite3 #{database.db_path}"
52
+ rescue StandardError => e
53
+ $stderr.puts "Import failed."
54
+ $stderr.puts "#{e.message}"
55
+ exit 50
56
+ end
57
+
58
+ desc "console CSV_FILE", "Opens a query console for the given CSV file, creating a database file and importing the data if necessary."
59
+ def console(csv_file)
60
+ database = Database.new(csv_file)
61
+
62
+ unless database.table_created? && database.imported?
63
+ database.create
64
+ database.import
65
+ end
66
+
67
+ console = Console.new(database.db)
68
+
69
+ puts "Found database at #{database.db_path}"
70
+ puts "Connection is in @db"
71
+ puts
72
+ puts "Primary table name is #{database.table_name}"
73
+ puts "Columns: #{database.db[database.table_name].columns.join(", ")}"
74
+
75
+ first_row = database.db[database.table_name].first
76
+ puts
77
+ puts "First row:"
78
+ puts first_row.keys.join(", ")
79
+ puts first_row.values.join(", ")
80
+
81
+ puts
82
+ puts "Not sure what to do next? Try this:"
83
+ puts "@db[:#{database.table_name}].count"
84
+
85
+ console.get_binding.pry(quiet: true, prompt: [proc { "> " }, proc { "* " }])
86
+ rescue StandardError => e
87
+ $stderr.puts "Failed to start console."
88
+ $stderr.puts "#{e.message}"
89
+ exit 60
90
+ end
91
+ end
92
+ end
93
+
94
+ InCSV::CLI.start
data/incsv.gemspec CHANGED
@@ -24,6 +24,7 @@ Gem::Specification.new do |spec|
24
24
  spec.add_development_dependency "rspec", "~> 3.0"
25
25
 
26
26
  spec.add_runtime_dependency "thor", "~> 0.19.1"
27
+ spec.add_runtime_dependency "pry", "~> 0.10"
27
28
  spec.add_runtime_dependency "sqlite3", "~> 1.3"
28
29
  spec.add_runtime_dependency "sequel", "~> 4.31"
29
30
  end
@@ -0,0 +1,25 @@
1
+ require "bigdecimal"
2
+
3
+ module InCSV
4
+ class Column
5
+ def initialize(name, values)
6
+ @name = name
7
+ @values = values
8
+ end
9
+
10
+ attr_reader :name
11
+
12
+ def type
13
+ Types.constants.select do |column_type|
14
+ column_type = Types.const_get(column_type)
15
+ if values.all? { |value| value.nil? || column_type.new(value).match? }
16
+ return column_type
17
+ end
18
+ end
19
+ end
20
+
21
+ private
22
+
23
+ attr_accessor :values
24
+ end
25
+ end
@@ -0,0 +1,30 @@
1
+ module InCSV
2
+ class ColumnType
3
+ def self.name
4
+ self.to_s.sub(/.*::/, "").downcase.to_sym
5
+ end
6
+
7
+ def self.for_database
8
+ self.to_s.sub(/.*::/, "").downcase.to_sym
9
+ end
10
+
11
+ def initialize(value)
12
+ @value = value
13
+ end
14
+
15
+ def match?
16
+ false
17
+ end
18
+
19
+ def clean_value
20
+ self.class.clean_value(@value)
21
+ end
22
+
23
+ def self.clean_value(value)
24
+ value
25
+ end
26
+
27
+ private
28
+ attr_reader :value
29
+ end
30
+ end
@@ -0,0 +1,89 @@
1
+ require "sequel"
2
+
3
+ require "pathname"
4
+
5
+ module InCSV
6
+ class Database
7
+ def initialize(csv)
8
+ @csv = csv
9
+
10
+ @db = Sequel.sqlite(db_path)
11
+ # require "logger"
12
+ # @db.loggers << Logger.new($stdout)
13
+ end
14
+
15
+ attr_reader :db
16
+
17
+ def table_created?
18
+ @db.table_exists?(table_name)
19
+ end
20
+
21
+ def imported?
22
+ table_created? && @db[table_name].count > 0
23
+ end
24
+
25
+ def exists?
26
+ File.exist?(db_path)
27
+ end
28
+
29
+ def db_path
30
+ path = Pathname(csv)
31
+ (path.dirname + (path.basename(".csv").to_s + ".db")).to_s
32
+ end
33
+
34
+ def table_name
35
+ @table_name ||= begin
36
+ File.basename(csv, ".csv").downcase.gsub(/[^a-z_]/, "").to_sym
37
+ end
38
+ end
39
+
40
+ def create_table
41
+ @db.create_table!(table_name) do
42
+ primary_key :_incsv_id
43
+ end
44
+
45
+ schema.columns.each do |c|
46
+ @db.alter_table(table_name) do
47
+ add_column c.name, c.type.for_database
48
+ end
49
+ end
50
+ end
51
+
52
+ def import
53
+ return if imported?
54
+
55
+ create_table unless table_created?
56
+
57
+ columns = schema.columns
58
+ column_names = columns.map(&:name)
59
+
60
+ chunks(200) do |chunk|
61
+ rows = chunk.map do |row|
62
+ row.to_hash.values.each_with_index.map do |column, n|
63
+ columns[n].type.clean_value(column)
64
+ end
65
+ end
66
+
67
+ @db[table_name].import(column_names, rows)
68
+ end
69
+ end
70
+
71
+ private
72
+
73
+ attr_reader :csv
74
+
75
+ def schema
76
+ @schema ||= Schema.new(csv)
77
+ end
78
+
79
+ def chunks(size = 200, &block)
80
+ data =
81
+ File.read(csv)
82
+ .encode("UTF-8", invalid: :replace, undef: :replace, replace: "")
83
+
84
+ csv = CSV.new(data, headers: true)
85
+ csv.each_slice(size, &block)
86
+ csv.close
87
+ end
88
+ end
89
+ end
@@ -0,0 +1,55 @@
1
+ require "csv"
2
+
3
+ module InCSV
4
+ class Schema
5
+ def initialize(csv)
6
+ @csv = csv
7
+ end
8
+
9
+ def columns
10
+ @columns ||= parsed_columns
11
+ end
12
+
13
+ private
14
+
15
+ attr_reader :csv
16
+
17
+ def parsed_columns
18
+ samples(50).map do |name, values|
19
+ Column.new(name, values)
20
+ end
21
+ end
22
+
23
+ # Returns the first `num_rows` rows of data, transposed into a hash.
24
+ #
25
+ # For example, the following CSV data:
26
+ #
27
+ # foo,bar
28
+ # 1,2
29
+ # 3,4
30
+ #
31
+ # Would become:
32
+ #
33
+ # { "foo" => [1, 3], "bar" => [2, 4] }
34
+ #
35
+ # This gives us enough data to be able to guess the type of
36
+ # a column.
37
+ def samples(num_rows)
38
+ data =
39
+ File.read(csv)
40
+ .encode("UTF-8", invalid: :replace, undef: :replace, replace: "")
41
+
42
+ csv = CSV.new(data, headers: true)
43
+ sample_data = csv.each.take(num_rows)
44
+ csv.close
45
+
46
+ sample_data.map(&:to_a).flatten(1).each_with_object({}) do |row, data|
47
+ column = row[0]
48
+ value = row[1]
49
+
50
+ data[column] ||= []
51
+ data[column] << value
52
+ end
53
+ end
54
+ end
55
+ end
@@ -0,0 +1,23 @@
1
+ module InCSV
2
+ module Types
3
+ class Currency < ColumnType
4
+ MATCH_EXPRESSION = /\A(\$|£)([0-9,\.]+)\z/
5
+
6
+ def self.for_database
7
+ "DECIMAL(10,2)"
8
+ end
9
+
10
+ def match?
11
+ value.strip.match(MATCH_EXPRESSION)
12
+ end
13
+
14
+ def self.clean_value(value)
15
+ return unless value
16
+
17
+ value.strip.match(MATCH_EXPRESSION) do |match|
18
+ BigDecimal(match[2].delete(","))
19
+ end
20
+ end
21
+ end
22
+ end
23
+ end
@@ -0,0 +1,9 @@
1
+ module InCSV
2
+ module Types
3
+ class Date < ColumnType
4
+ def match?
5
+ value.strip.match(/\A[0-9]{4}-[0-9]{2}-[0-9]{2}\z/)
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,9 @@
1
+ module InCSV
2
+ module Types
3
+ class String < ColumnType
4
+ def match?
5
+ true
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,5 @@
1
+ require "incsv/column_type"
2
+
3
+ require "incsv/types/date"
4
+ require "incsv/types/currency"
5
+ require "incsv/types/string"
data/lib/incsv/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module InCSV
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
data/lib/incsv.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  require "incsv/version"
2
2
 
3
- module InCSV
4
- # Your code goes here...
5
- end
3
+ require "incsv/schema"
4
+ require "incsv/types"
5
+ require "incsv/column"
6
+ require "incsv/database"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: incsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rob Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-02-17 00:00:00.000000000 Z
11
+ date: 2016-02-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -66,6 +66,20 @@ dependencies:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: 0.19.1
69
+ - !ruby/object:Gem::Dependency
70
+ name: pry
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '0.10'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '0.10'
69
83
  - !ruby/object:Gem::Dependency
70
84
  name: sqlite3
71
85
  requirement: !ruby/object:Gem::Requirement
@@ -98,7 +112,8 @@ description: Loads a CSV file into an SQLite database automatically, dropping yo
98
112
  into a Ruby shell that allows you to explore the data within.
99
113
  email:
100
114
  - rob@bigfish.co.uk
101
- executables: []
115
+ executables:
116
+ - incsv
102
117
  extensions: []
103
118
  extra_rdoc_files: []
104
119
  files:
@@ -112,8 +127,17 @@ files:
112
127
  - Rakefile
113
128
  - bin/console
114
129
  - bin/setup
130
+ - exe/incsv
115
131
  - incsv.gemspec
116
132
  - lib/incsv.rb
133
+ - lib/incsv/column.rb
134
+ - lib/incsv/column_type.rb
135
+ - lib/incsv/database.rb
136
+ - lib/incsv/schema.rb
137
+ - lib/incsv/types.rb
138
+ - lib/incsv/types/currency.rb
139
+ - lib/incsv/types/date.rb
140
+ - lib/incsv/types/string.rb
117
141
  - lib/incsv/version.rb
118
142
  homepage: https://github.com/robmiller/incsv
119
143
  licenses: