incsv 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 349fb4611684a002bf078566b5d115d826f4e8a9
4
- data.tar.gz: 2f2551c05e9a5adf601d1fb5bbd272b956e37140
3
+ metadata.gz: 2a64c9c204b84b53e240994132a99746b64a3d8a
4
+ data.tar.gz: e302031882b7e9b9aac37108e79025c593062c31
5
5
  SHA512:
6
- metadata.gz: 73ad7b8f885e8068898fa8ef8ff80947e2221a5f367c2dad9a948ffb7b7a26a35ab5c5cd47ff670ffedefaa16c47b16b364cf8f8b63ab127c387d01484fb10c3
7
- data.tar.gz: 689a7012aa844f8f3b8d04a0385b1554ac226b5a92a9de4ba8d28d4398e74546ff1b68a64faaef7ef0ed34a08d55bc7ee3c00f5c449a9de71408fc7d2e2f45b9
6
+ metadata.gz: 22d2a9fb3bcfc0206b96af378d5b7dffdc0b79501c1eac19b9482451c691ca1e9bbdee925b22971e68bedf4cc33c64b4d2f26c46e13d089fe48f566100accb50
7
+ data.tar.gz: 270ca3f95c76700bc24f410ec90b39141d40fb683d32557b1fc0b72e5a99853fb22dc7fe5a81db335087e966d3c85dac8870a72325f34cbf01ad966d68f9dfcf
data/.gitignore CHANGED
@@ -7,3 +7,4 @@
7
7
  /pkg/
8
8
  /spec/reports/
9
9
  /tmp/
10
+ *.db
data/README.md CHANGED
@@ -18,7 +18,128 @@ incsv can be installed via RubyGems:
18
18
 
19
19
  ## Usage
20
20
 
21
- TBC.
21
+ ### The quick version
22
+
23
+ The following command will drop you into a [REPL][] prompt:
24
+
25
+ $ incsv console path/to/file.csv
26
+
27
+ A Sequel connection to the database is stored in a variable called
28
+ `@db`. The name of the table is based on the filename of the CSV; so, if
29
+ your CSV file is called `products.csv`, then data will be imported into
30
+ a database table called `products`.
31
+
32
+ A quick example:
33
+
34
+ > @db[:products].select(:name).reverse_order(:price).take(5)
35
+ => [{:name=>"Makeshift battery"},
36
+ {:name=>"clothing iron"},
37
+ {:name=>"toy alien"},
38
+ {:name=>"enhanced targeting card"},
39
+ {:name=>"Giddyup Buttercup"}]
40
+
41
+ [repl]: https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop
42
+
43
+ ### The less-quick version
44
+
45
+ To use incsv, you essentially just need to point it at a CSV file. It’ll
46
+ then take care of parsing the CSV, figuring out the nature of the data
47
+ within it, creating a database and a table, and importing the data.
48
+
49
+ To perform all of these steps and be given an interactive console once
50
+ they’re done, you can use the `console` command.
51
+
52
+ Let’s imagine we have a CSV file that contains some product information:
53
+
54
+ $ head -3 products.csv
55
+ name,date_added,price
56
+ "Acid",2013-03-24,£38
57
+ "Abraxo cleaner",2016-09-25,£21
58
+
59
+ Here we can see that we have three columns: the product name, which is
60
+ just a string; the date the product was added, which is an
61
+ ISO-8601–formatted date; and the price, which is a currency value in
62
+ dollars.
63
+
64
+ In my sample data there are 515 products (plus a header row):
65
+
66
+ $ wc -l products.csv
67
+ 516
68
+
69
+ In order to query this data, we can pass the CSV file to incsv:
70
+
71
+ $ incsv console products.csv
72
+ Found database at products.db
73
+ Connection is in @db
74
+
75
+ Primary table name is products
76
+ Columns: _incsv_id, name, date_added, price
77
+
78
+ First row:
79
+ _incsv_id, name, date_added, price
80
+ 1, Acid, 2013-03-24, 0.38E2
81
+
82
+ Not sure what to do next? Try this:
83
+ @db[:products].count
84
+ >
85
+
86
+ It tells us some information about the file, and about the assumptions
87
+ it has made about the file. We can see that it’s imported the contents
88
+ of the file into a table called `products`, and that it’s used the
89
+ column names from the CSV to name the columns in the database table.
90
+
91
+ It also shows us the first row, where you might have noticed that the
92
+ price is in a slightly odd representation. That’s because incsv will
93
+ look at what type of data seems to be stored in your CSV before
94
+ importing it. In this case, it knows that the `date_added` column
95
+ contains a date, and that the `price` column contains a currency value.
96
+ In the former case, that means converting it into an actual SQL date. In
97
+ the latter case, this means converting it to `BigDecimal` format (and
98
+ storing it in the database as `DECIMAL(10, 2)`, so that we don’t either
99
+ lose any precision by storing the value as a float, or lose the ability
100
+ to do numerical calculations by storing it as a string.
101
+
102
+ It then suggests a query for us to run, which might generally be the
103
+ first thing that you’d want to know about the dataset: how many values
104
+ are there? We can run it and see:
105
+
106
+ > @db[:products].count
107
+ => 515
108
+
109
+ Excellent! It’s imported every one of the products that were in the CSV.
110
+
111
+ From this point on we can do any kind of analysis of the data that we
112
+ like; we have all the power of SQLite and Sequel at our fingertips. For
113
+ example, to get the number of products added each year:
114
+
115
+ > @db[:products].group_and_count{strftime("%Y", date_added).as(year)}.all
116
+ => [{:year=>"2013", :count=>132}, {:year=>"2014", :count=>123}, {:year=>"2015", :count=>131}, {:year=>"2016", :count=>129}]
117
+
118
+ Or to get the total value of products added today:
119
+
120
+ > @db[:products].select{sum(price).as(total_cost)}.where(date_added: Date.today).first
121
+ => {:total_cost=>40}
122
+
123
+ We can also do processing in Ruby, if there’s anything that’s difficult
124
+ in pure SQL. Imagine wanting to convert the product names to
125
+ URL-friendly “slugs”. This is pretty easy in Ruby. Let’s try it out on
126
+ the top 10 most expensive products:
127
+
128
+ > @db[:products].select(:name).reverse_order(:price).limit(10).each do |product|
129
+ * puts product[:name].gsub(/\s/, "-").squeeze("-").downcase.gsub(/[^a-z0-9\-]/, "")
130
+ * end
131
+ makeshift-battery
132
+ clothing-iron
133
+ toy-alien
134
+ enhanced-targeting-card
135
+ giddyup-buttercup
136
+ mole-rat-teeth
137
+ empty-teal-rounded-vase
138
+ pre-war-money
139
+ bowling-ball
140
+ toothbrush
141
+
142
+ Hopefully this illustrates what you can do with incsv!
22
143
 
23
144
  ## Development
24
145
 
data/exe/incsv ADDED
@@ -0,0 +1,94 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
4
+
5
+ require "thor"
6
+ require "pry"
7
+
8
+ require "incsv"
9
+
10
+ module InCSV
11
+ class Console
12
+ def initialize(db)
13
+ @db = db
14
+ end
15
+
16
+ def get_binding
17
+ binding
18
+ end
19
+ end
20
+
21
+ class CLI < Thor
22
+ desc "create CSV_FILE", "Creates a database file with the appropriate schema for the given CSV file, but doesn't import any data."
23
+ method_option :force, type: :boolean, default: false
24
+ def create(csv_file)
25
+ database = Database.new(csv_file)
26
+
27
+ if database.exists? && database.table_created? && !options.force?
28
+ $stderr.puts "Database already exists."
29
+ exit 41
30
+ end
31
+
32
+ database.create_table
33
+ puts "Database created successfully in #{database.db_path}"
34
+ rescue StandardError => e
35
+ $stderr.puts "Database failed to create."
36
+ $stderr.puts "#{e.message}"
37
+ exit 40
38
+ end
39
+
40
+ desc "import CSV_FILE", "Creates a database file with the appropriate schema for the given CSV file, and then imports the data within the file."
41
+
42
+ method_option :force, type: :boolean, default: false
43
+ def import(csv_file)
44
+ database = Database.new(csv_file)
45
+ create(csv_file)
46
+ database.import
47
+
48
+ puts "Data imported."
49
+ puts
50
+ puts "Command to query:"
51
+ puts "$ sqlite3 #{database.db_path}"
52
+ rescue StandardError => e
53
+ $stderr.puts "Import failed."
54
+ $stderr.puts "#{e.message}"
55
+ exit 50
56
+ end
57
+
58
+ desc "console CSV_FILE", "Opens a query console for the given CSV file, creating a database file and importing the data if necessary."
59
+ def console(csv_file)
60
+ database = Database.new(csv_file)
61
+
62
+ unless database.table_created? && database.imported?
63
+ database.create
64
+ database.import
65
+ end
66
+
67
+ console = Console.new(database.db)
68
+
69
+ puts "Found database at #{database.db_path}"
70
+ puts "Connection is in @db"
71
+ puts
72
+ puts "Primary table name is #{database.table_name}"
73
+ puts "Columns: #{database.db[database.table_name].columns.join(", ")}"
74
+
75
+ first_row = database.db[database.table_name].first
76
+ puts
77
+ puts "First row:"
78
+ puts first_row.keys.join(", ")
79
+ puts first_row.values.join(", ")
80
+
81
+ puts
82
+ puts "Not sure what to do next? Try this:"
83
+ puts "@db[:#{database.table_name}].count"
84
+
85
+ console.get_binding.pry(quiet: true, prompt: [proc { "> " }, proc { "* " }])
86
+ rescue StandardError => e
87
+ $stderr.puts "Failed to start console."
88
+ $stderr.puts "#{e.message}"
89
+ exit 60
90
+ end
91
+ end
92
+ end
93
+
94
+ InCSV::CLI.start
data/incsv.gemspec CHANGED
@@ -24,6 +24,7 @@ Gem::Specification.new do |spec|
24
24
  spec.add_development_dependency "rspec", "~> 3.0"
25
25
 
26
26
  spec.add_runtime_dependency "thor", "~> 0.19.1"
27
+ spec.add_runtime_dependency "pry", "~> 0.10"
27
28
  spec.add_runtime_dependency "sqlite3", "~> 1.3"
28
29
  spec.add_runtime_dependency "sequel", "~> 4.31"
29
30
  end
@@ -0,0 +1,25 @@
1
+ require "bigdecimal"
2
+
3
+ module InCSV
4
+ class Column
5
+ def initialize(name, values)
6
+ @name = name
7
+ @values = values
8
+ end
9
+
10
+ attr_reader :name
11
+
12
+ def type
13
+ Types.constants.select do |column_type|
14
+ column_type = Types.const_get(column_type)
15
+ if values.all? { |value| value.nil? || column_type.new(value).match? }
16
+ return column_type
17
+ end
18
+ end
19
+ end
20
+
21
+ private
22
+
23
+ attr_accessor :values
24
+ end
25
+ end
@@ -0,0 +1,30 @@
1
+ module InCSV
2
+ class ColumnType
3
+ def self.name
4
+ self.to_s.sub(/.*::/, "").downcase.to_sym
5
+ end
6
+
7
+ def self.for_database
8
+ self.to_s.sub(/.*::/, "").downcase.to_sym
9
+ end
10
+
11
+ def initialize(value)
12
+ @value = value
13
+ end
14
+
15
+ def match?
16
+ false
17
+ end
18
+
19
+ def clean_value
20
+ self.class.clean_value(@value)
21
+ end
22
+
23
+ def self.clean_value(value)
24
+ value
25
+ end
26
+
27
+ private
28
+ attr_reader :value
29
+ end
30
+ end
@@ -0,0 +1,89 @@
1
+ require "sequel"
2
+
3
+ require "pathname"
4
+
5
+ module InCSV
6
+ class Database
7
+ def initialize(csv)
8
+ @csv = csv
9
+
10
+ @db = Sequel.sqlite(db_path)
11
+ # require "logger"
12
+ # @db.loggers << Logger.new($stdout)
13
+ end
14
+
15
+ attr_reader :db
16
+
17
+ def table_created?
18
+ @db.table_exists?(table_name)
19
+ end
20
+
21
+ def imported?
22
+ table_created? && @db[table_name].count > 0
23
+ end
24
+
25
+ def exists?
26
+ File.exist?(db_path)
27
+ end
28
+
29
+ def db_path
30
+ path = Pathname(csv)
31
+ (path.dirname + (path.basename(".csv").to_s + ".db")).to_s
32
+ end
33
+
34
+ def table_name
35
+ @table_name ||= begin
36
+ File.basename(csv, ".csv").downcase.gsub(/[^a-z_]/, "").to_sym
37
+ end
38
+ end
39
+
40
+ def create_table
41
+ @db.create_table!(table_name) do
42
+ primary_key :_incsv_id
43
+ end
44
+
45
+ schema.columns.each do |c|
46
+ @db.alter_table(table_name) do
47
+ add_column c.name, c.type.for_database
48
+ end
49
+ end
50
+ end
51
+
52
+ def import
53
+ return if imported?
54
+
55
+ create_table unless table_created?
56
+
57
+ columns = schema.columns
58
+ column_names = columns.map(&:name)
59
+
60
+ chunks(200) do |chunk|
61
+ rows = chunk.map do |row|
62
+ row.to_hash.values.each_with_index.map do |column, n|
63
+ columns[n].type.clean_value(column)
64
+ end
65
+ end
66
+
67
+ @db[table_name].import(column_names, rows)
68
+ end
69
+ end
70
+
71
+ private
72
+
73
+ attr_reader :csv
74
+
75
+ def schema
76
+ @schema ||= Schema.new(csv)
77
+ end
78
+
79
+ def chunks(size = 200, &block)
80
+ data =
81
+ File.read(csv)
82
+ .encode("UTF-8", invalid: :replace, undef: :replace, replace: "")
83
+
84
+ csv = CSV.new(data, headers: true)
85
+ csv.each_slice(size, &block)
86
+ csv.close
87
+ end
88
+ end
89
+ end
@@ -0,0 +1,55 @@
1
+ require "csv"
2
+
3
+ module InCSV
4
+ class Schema
5
+ def initialize(csv)
6
+ @csv = csv
7
+ end
8
+
9
+ def columns
10
+ @columns ||= parsed_columns
11
+ end
12
+
13
+ private
14
+
15
+ attr_reader :csv
16
+
17
+ def parsed_columns
18
+ samples(50).map do |name, values|
19
+ Column.new(name, values)
20
+ end
21
+ end
22
+
23
+ # Returns the first `num_rows` rows of data, transposed into a hash.
24
+ #
25
+ # For example, the following CSV data:
26
+ #
27
+ # foo,bar
28
+ # 1,2
29
+ # 3,4
30
+ #
31
+ # Would become:
32
+ #
33
+ # { "foo" => [1, 3], "bar" => [2, 4] }
34
+ #
35
+ # This gives us enough data to be able to guess the type of
36
+ # a column.
37
+ def samples(num_rows)
38
+ data =
39
+ File.read(csv)
40
+ .encode("UTF-8", invalid: :replace, undef: :replace, replace: "")
41
+
42
+ csv = CSV.new(data, headers: true)
43
+ sample_data = csv.each.take(num_rows)
44
+ csv.close
45
+
46
+ sample_data.map(&:to_a).flatten(1).each_with_object({}) do |row, data|
47
+ column = row[0]
48
+ value = row[1]
49
+
50
+ data[column] ||= []
51
+ data[column] << value
52
+ end
53
+ end
54
+ end
55
+ end
@@ -0,0 +1,23 @@
1
+ module InCSV
2
+ module Types
3
+ class Currency < ColumnType
4
+ MATCH_EXPRESSION = /\A(\$|£)([0-9,\.]+)\z/
5
+
6
+ def self.for_database
7
+ "DECIMAL(10,2)"
8
+ end
9
+
10
+ def match?
11
+ value.strip.match(MATCH_EXPRESSION)
12
+ end
13
+
14
+ def self.clean_value(value)
15
+ return unless value
16
+
17
+ value.strip.match(MATCH_EXPRESSION) do |match|
18
+ BigDecimal(match[2].delete(","))
19
+ end
20
+ end
21
+ end
22
+ end
23
+ end
@@ -0,0 +1,9 @@
1
+ module InCSV
2
+ module Types
3
+ class Date < ColumnType
4
+ def match?
5
+ value.strip.match(/\A[0-9]{4}-[0-9]{2}-[0-9]{2}\z/)
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,9 @@
1
+ module InCSV
2
+ module Types
3
+ class String < ColumnType
4
+ def match?
5
+ true
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,5 @@
1
+ require "incsv/column_type"
2
+
3
+ require "incsv/types/date"
4
+ require "incsv/types/currency"
5
+ require "incsv/types/string"
data/lib/incsv/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module InCSV
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
data/lib/incsv.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  require "incsv/version"
2
2
 
3
- module InCSV
4
- # Your code goes here...
5
- end
3
+ require "incsv/schema"
4
+ require "incsv/types"
5
+ require "incsv/column"
6
+ require "incsv/database"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: incsv
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rob Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-02-17 00:00:00.000000000 Z
11
+ date: 2016-02-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -66,6 +66,20 @@ dependencies:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: 0.19.1
69
+ - !ruby/object:Gem::Dependency
70
+ name: pry
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '0.10'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '0.10'
69
83
  - !ruby/object:Gem::Dependency
70
84
  name: sqlite3
71
85
  requirement: !ruby/object:Gem::Requirement
@@ -98,7 +112,8 @@ description: Loads a CSV file into an SQLite database automatically, dropping yo
98
112
  into a Ruby shell that allows you to explore the data within.
99
113
  email:
100
114
  - rob@bigfish.co.uk
101
- executables: []
115
+ executables:
116
+ - incsv
102
117
  extensions: []
103
118
  extra_rdoc_files: []
104
119
  files:
@@ -112,8 +127,17 @@ files:
112
127
  - Rakefile
113
128
  - bin/console
114
129
  - bin/setup
130
+ - exe/incsv
115
131
  - incsv.gemspec
116
132
  - lib/incsv.rb
133
+ - lib/incsv/column.rb
134
+ - lib/incsv/column_type.rb
135
+ - lib/incsv/database.rb
136
+ - lib/incsv/schema.rb
137
+ - lib/incsv/types.rb
138
+ - lib/incsv/types/currency.rb
139
+ - lib/incsv/types/date.rb
140
+ - lib/incsv/types/string.rb
117
141
  - lib/incsv/version.rb
118
142
  homepage: https://github.com/robmiller/incsv
119
143
  licenses: