incsv 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/README.md +122 -1
- data/exe/incsv +94 -0
- data/incsv.gemspec +1 -0
- data/lib/incsv/column.rb +25 -0
- data/lib/incsv/column_type.rb +30 -0
- data/lib/incsv/database.rb +89 -0
- data/lib/incsv/schema.rb +55 -0
- data/lib/incsv/types/currency.rb +23 -0
- data/lib/incsv/types/date.rb +9 -0
- data/lib/incsv/types/string.rb +9 -0
- data/lib/incsv/types.rb +5 -0
- data/lib/incsv/version.rb +1 -1
- data/lib/incsv.rb +4 -3
- metadata +27 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2a64c9c204b84b53e240994132a99746b64a3d8a
|
4
|
+
data.tar.gz: e302031882b7e9b9aac37108e79025c593062c31
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 22d2a9fb3bcfc0206b96af378d5b7dffdc0b79501c1eac19b9482451c691ca1e9bbdee925b22971e68bedf4cc33c64b4d2f26c46e13d089fe48f566100accb50
|
7
|
+
data.tar.gz: 270ca3f95c76700bc24f410ec90b39141d40fb683d32557b1fc0b72e5a99853fb22dc7fe5a81db335087e966d3c85dac8870a72325f34cbf01ad966d68f9dfcf
|
data/.gitignore
CHANGED
data/README.md
CHANGED
@@ -18,7 +18,128 @@ incsv can be installed via RubyGems:
|
|
18
18
|
|
19
19
|
## Usage
|
20
20
|
|
21
|
-
|
21
|
+
### The quick version
|
22
|
+
|
23
|
+
The following command will drop you into a [REPL][] prompt:
|
24
|
+
|
25
|
+
$ incsv console path/to/file.csv
|
26
|
+
|
27
|
+
A Sequel connection to the database is stored in a variable called
|
28
|
+
`@db`. The name of the table is based on the filename of the CSV; so, if
|
29
|
+
your CSV file is called `products.csv`, then data will be imported into
|
30
|
+
a database table called `products`.
|
31
|
+
|
32
|
+
A quick example:
|
33
|
+
|
34
|
+
> @db[:products].select(:name).reverse_order(:price).take(5)
|
35
|
+
=> [{:name=>"Makeshift battery"},
|
36
|
+
{:name=>"clothing iron"},
|
37
|
+
{:name=>"toy alien"},
|
38
|
+
{:name=>"enhanced targeting card"},
|
39
|
+
{:name=>"Giddyup Buttercup"}]
|
40
|
+
|
41
|
+
[repl]: https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop
|
42
|
+
|
43
|
+
### The less-quick version
|
44
|
+
|
45
|
+
To use incsv, you essentially just need to point it at a CSV file. It’ll
|
46
|
+
then take care of parsing the CSV, figuring out the nature of the data
|
47
|
+
within it, creating a database and a table, and importing the data.
|
48
|
+
|
49
|
+
To perform all of these steps and be given an interactive console once
|
50
|
+
they’re done, you can use the `console` command.
|
51
|
+
|
52
|
+
Let’s imagine we have a CSV file that contains some product information:
|
53
|
+
|
54
|
+
$ head -3 products.csv
|
55
|
+
name,date_added,price
|
56
|
+
"Acid",2013-03-24,£38
|
57
|
+
"Abraxo cleaner",2016-09-25,£21
|
58
|
+
|
59
|
+
Here we can see that we have three columns: the product name, which is
|
60
|
+
just a string; the date the product was added, which is an
|
61
|
+
ISO-8601–formatted date; and the price, which is a currency value in
|
62
|
+
dollars.
|
63
|
+
|
64
|
+
In my sample data there are 515 products (plus a header row):
|
65
|
+
|
66
|
+
$ wc -l products.csv
|
67
|
+
516
|
68
|
+
|
69
|
+
In order to query this data, we can pass the CSV file to incsv:
|
70
|
+
|
71
|
+
$ incsv console products.csv
|
72
|
+
Found database at products.db
|
73
|
+
Connection is in @db
|
74
|
+
|
75
|
+
Primary table name is products
|
76
|
+
Columns: _incsv_id, name, date_added, price
|
77
|
+
|
78
|
+
First row:
|
79
|
+
_incsv_id, name, date_added, price
|
80
|
+
1, Acid, 2013-03-24, 0.38E2
|
81
|
+
|
82
|
+
Not sure what to do next? Try this:
|
83
|
+
@db[:products].count
|
84
|
+
>
|
85
|
+
|
86
|
+
It tells us some information about the file, and about the assumptions
|
87
|
+
it has made about the file. We can see that it’s imported the contents
|
88
|
+
of the file into a table called `products`, and that it’s used the
|
89
|
+
column names from the CSV to name the columns in the database table.
|
90
|
+
|
91
|
+
It also shows us the first row, where you might have noticed that the
|
92
|
+
price is in a slightly odd representation. That’s because incsv will
|
93
|
+
look at what type of data seems to be stored in your CSV before
|
94
|
+
importing it. In this case, it knows that the `date_added` column
|
95
|
+
contains a date, and that the `price` column contains a currency value.
|
96
|
+
In the former case, that means converting it into an actual SQL date. In
|
97
|
+
the latter case, this means converting it to `BigDecimal` format (and
|
98
|
+
storing it in the database as `DECIMAL(10, 2)`, so that we don’t either
|
99
|
+
lose any precision by storing the value as a float, or lose the ability
|
100
|
+
to do numerical calculations by storing it as a string.
|
101
|
+
|
102
|
+
It then suggests a query for us to run, which might generally be the
|
103
|
+
first thing that you’d want to know about the dataset: how many values
|
104
|
+
are there? We can run it and see:
|
105
|
+
|
106
|
+
> @db[:products].count
|
107
|
+
=> 515
|
108
|
+
|
109
|
+
Excellent! It’s imported every one of the products that were in the CSV.
|
110
|
+
|
111
|
+
From this point on we can do any kind of analysis of the data that we
|
112
|
+
like; we have all the power of SQLite and Sequel at our fingertips. For
|
113
|
+
example, to get the number of products added each year:
|
114
|
+
|
115
|
+
> @db[:products].group_and_count{strftime("%Y", date_added).as(year)}.all
|
116
|
+
=> [{:year=>"2013", :count=>132}, {:year=>"2014", :count=>123}, {:year=>"2015", :count=>131}, {:year=>"2016", :count=>129}]
|
117
|
+
|
118
|
+
Or to get the total value of products added today:
|
119
|
+
|
120
|
+
> @db[:products].select{sum(price).as(total_cost)}.where(date_added: Date.today).first
|
121
|
+
=> {:total_cost=>40}
|
122
|
+
|
123
|
+
We can also do processing in Ruby, if there’s anything that’s difficult
|
124
|
+
in pure SQL. Imagine wanting to convert the product names to
|
125
|
+
URL-friendly “slugs”. This is pretty easy in Ruby. Let’s try it out on
|
126
|
+
the top 10 most expensive products:
|
127
|
+
|
128
|
+
> @db[:products].select(:name).reverse_order(:price).limit(10).each do |product|
|
129
|
+
* puts product[:name].gsub(/\s/, "-").squeeze("-").downcase.gsub(/[^a-z0-9\-]/, "")
|
130
|
+
* end
|
131
|
+
makeshift-battery
|
132
|
+
clothing-iron
|
133
|
+
toy-alien
|
134
|
+
enhanced-targeting-card
|
135
|
+
giddyup-buttercup
|
136
|
+
mole-rat-teeth
|
137
|
+
empty-teal-rounded-vase
|
138
|
+
pre-war-money
|
139
|
+
bowling-ball
|
140
|
+
toothbrush
|
141
|
+
|
142
|
+
Hopefully this illustrates what you can do with incsv!
|
22
143
|
|
23
144
|
## Development
|
24
145
|
|
data/exe/incsv
ADDED
@@ -0,0 +1,94 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
$LOAD_PATH.unshift File.expand_path('../../lib', __FILE__)
|
4
|
+
|
5
|
+
require "thor"
|
6
|
+
require "pry"
|
7
|
+
|
8
|
+
require "incsv"
|
9
|
+
|
10
|
+
module InCSV
|
11
|
+
class Console
|
12
|
+
def initialize(db)
|
13
|
+
@db = db
|
14
|
+
end
|
15
|
+
|
16
|
+
def get_binding
|
17
|
+
binding
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
class CLI < Thor
|
22
|
+
desc "create CSV_FILE", "Creates a database file with the appropriate schema for the given CSV file, but doesn't import any data."
|
23
|
+
method_option :force, type: :boolean, default: false
|
24
|
+
def create(csv_file)
|
25
|
+
database = Database.new(csv_file)
|
26
|
+
|
27
|
+
if database.exists? && database.table_created? && !options.force?
|
28
|
+
$stderr.puts "Database already exists."
|
29
|
+
exit 41
|
30
|
+
end
|
31
|
+
|
32
|
+
database.create_table
|
33
|
+
puts "Database created successfully in #{database.db_path}"
|
34
|
+
rescue StandardError => e
|
35
|
+
$stderr.puts "Database failed to create."
|
36
|
+
$stderr.puts "#{e.message}"
|
37
|
+
exit 40
|
38
|
+
end
|
39
|
+
|
40
|
+
desc "import CSV_FILE", "Creates a database file with the appropriate schema for the given CSV file, and then imports the data within the file."
|
41
|
+
|
42
|
+
method_option :force, type: :boolean, default: false
|
43
|
+
def import(csv_file)
|
44
|
+
database = Database.new(csv_file)
|
45
|
+
create(csv_file)
|
46
|
+
database.import
|
47
|
+
|
48
|
+
puts "Data imported."
|
49
|
+
puts
|
50
|
+
puts "Command to query:"
|
51
|
+
puts "$ sqlite3 #{database.db_path}"
|
52
|
+
rescue StandardError => e
|
53
|
+
$stderr.puts "Import failed."
|
54
|
+
$stderr.puts "#{e.message}"
|
55
|
+
exit 50
|
56
|
+
end
|
57
|
+
|
58
|
+
desc "console CSV_FILE", "Opens a query console for the given CSV file, creating a database file and importing the data if necessary."
|
59
|
+
def console(csv_file)
|
60
|
+
database = Database.new(csv_file)
|
61
|
+
|
62
|
+
unless database.table_created? && database.imported?
|
63
|
+
database.create
|
64
|
+
database.import
|
65
|
+
end
|
66
|
+
|
67
|
+
console = Console.new(database.db)
|
68
|
+
|
69
|
+
puts "Found database at #{database.db_path}"
|
70
|
+
puts "Connection is in @db"
|
71
|
+
puts
|
72
|
+
puts "Primary table name is #{database.table_name}"
|
73
|
+
puts "Columns: #{database.db[database.table_name].columns.join(", ")}"
|
74
|
+
|
75
|
+
first_row = database.db[database.table_name].first
|
76
|
+
puts
|
77
|
+
puts "First row:"
|
78
|
+
puts first_row.keys.join(", ")
|
79
|
+
puts first_row.values.join(", ")
|
80
|
+
|
81
|
+
puts
|
82
|
+
puts "Not sure what to do next? Try this:"
|
83
|
+
puts "@db[:#{database.table_name}].count"
|
84
|
+
|
85
|
+
console.get_binding.pry(quiet: true, prompt: [proc { "> " }, proc { "* " }])
|
86
|
+
rescue StandardError => e
|
87
|
+
$stderr.puts "Failed to start console."
|
88
|
+
$stderr.puts "#{e.message}"
|
89
|
+
exit 60
|
90
|
+
end
|
91
|
+
end
|
92
|
+
end
|
93
|
+
|
94
|
+
InCSV::CLI.start
|
data/incsv.gemspec
CHANGED
@@ -24,6 +24,7 @@ Gem::Specification.new do |spec|
|
|
24
24
|
spec.add_development_dependency "rspec", "~> 3.0"
|
25
25
|
|
26
26
|
spec.add_runtime_dependency "thor", "~> 0.19.1"
|
27
|
+
spec.add_runtime_dependency "pry", "~> 0.10"
|
27
28
|
spec.add_runtime_dependency "sqlite3", "~> 1.3"
|
28
29
|
spec.add_runtime_dependency "sequel", "~> 4.31"
|
29
30
|
end
|
data/lib/incsv/column.rb
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
require "bigdecimal"
|
2
|
+
|
3
|
+
module InCSV
|
4
|
+
class Column
|
5
|
+
def initialize(name, values)
|
6
|
+
@name = name
|
7
|
+
@values = values
|
8
|
+
end
|
9
|
+
|
10
|
+
attr_reader :name
|
11
|
+
|
12
|
+
def type
|
13
|
+
Types.constants.select do |column_type|
|
14
|
+
column_type = Types.const_get(column_type)
|
15
|
+
if values.all? { |value| value.nil? || column_type.new(value).match? }
|
16
|
+
return column_type
|
17
|
+
end
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
private
|
22
|
+
|
23
|
+
attr_accessor :values
|
24
|
+
end
|
25
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
module InCSV
|
2
|
+
class ColumnType
|
3
|
+
def self.name
|
4
|
+
self.to_s.sub(/.*::/, "").downcase.to_sym
|
5
|
+
end
|
6
|
+
|
7
|
+
def self.for_database
|
8
|
+
self.to_s.sub(/.*::/, "").downcase.to_sym
|
9
|
+
end
|
10
|
+
|
11
|
+
def initialize(value)
|
12
|
+
@value = value
|
13
|
+
end
|
14
|
+
|
15
|
+
def match?
|
16
|
+
false
|
17
|
+
end
|
18
|
+
|
19
|
+
def clean_value
|
20
|
+
self.class.clean_value(@value)
|
21
|
+
end
|
22
|
+
|
23
|
+
def self.clean_value(value)
|
24
|
+
value
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
attr_reader :value
|
29
|
+
end
|
30
|
+
end
|
@@ -0,0 +1,89 @@
|
|
1
|
+
require "sequel"
|
2
|
+
|
3
|
+
require "pathname"
|
4
|
+
|
5
|
+
module InCSV
|
6
|
+
class Database
|
7
|
+
def initialize(csv)
|
8
|
+
@csv = csv
|
9
|
+
|
10
|
+
@db = Sequel.sqlite(db_path)
|
11
|
+
# require "logger"
|
12
|
+
# @db.loggers << Logger.new($stdout)
|
13
|
+
end
|
14
|
+
|
15
|
+
attr_reader :db
|
16
|
+
|
17
|
+
def table_created?
|
18
|
+
@db.table_exists?(table_name)
|
19
|
+
end
|
20
|
+
|
21
|
+
def imported?
|
22
|
+
table_created? && @db[table_name].count > 0
|
23
|
+
end
|
24
|
+
|
25
|
+
def exists?
|
26
|
+
File.exist?(db_path)
|
27
|
+
end
|
28
|
+
|
29
|
+
def db_path
|
30
|
+
path = Pathname(csv)
|
31
|
+
(path.dirname + (path.basename(".csv").to_s + ".db")).to_s
|
32
|
+
end
|
33
|
+
|
34
|
+
def table_name
|
35
|
+
@table_name ||= begin
|
36
|
+
File.basename(csv, ".csv").downcase.gsub(/[^a-z_]/, "").to_sym
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
def create_table
|
41
|
+
@db.create_table!(table_name) do
|
42
|
+
primary_key :_incsv_id
|
43
|
+
end
|
44
|
+
|
45
|
+
schema.columns.each do |c|
|
46
|
+
@db.alter_table(table_name) do
|
47
|
+
add_column c.name, c.type.for_database
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
def import
|
53
|
+
return if imported?
|
54
|
+
|
55
|
+
create_table unless table_created?
|
56
|
+
|
57
|
+
columns = schema.columns
|
58
|
+
column_names = columns.map(&:name)
|
59
|
+
|
60
|
+
chunks(200) do |chunk|
|
61
|
+
rows = chunk.map do |row|
|
62
|
+
row.to_hash.values.each_with_index.map do |column, n|
|
63
|
+
columns[n].type.clean_value(column)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
@db[table_name].import(column_names, rows)
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
private
|
72
|
+
|
73
|
+
attr_reader :csv
|
74
|
+
|
75
|
+
def schema
|
76
|
+
@schema ||= Schema.new(csv)
|
77
|
+
end
|
78
|
+
|
79
|
+
def chunks(size = 200, &block)
|
80
|
+
data =
|
81
|
+
File.read(csv)
|
82
|
+
.encode("UTF-8", invalid: :replace, undef: :replace, replace: "")
|
83
|
+
|
84
|
+
csv = CSV.new(data, headers: true)
|
85
|
+
csv.each_slice(size, &block)
|
86
|
+
csv.close
|
87
|
+
end
|
88
|
+
end
|
89
|
+
end
|
data/lib/incsv/schema.rb
ADDED
@@ -0,0 +1,55 @@
|
|
1
|
+
require "csv"
|
2
|
+
|
3
|
+
module InCSV
|
4
|
+
class Schema
|
5
|
+
def initialize(csv)
|
6
|
+
@csv = csv
|
7
|
+
end
|
8
|
+
|
9
|
+
def columns
|
10
|
+
@columns ||= parsed_columns
|
11
|
+
end
|
12
|
+
|
13
|
+
private
|
14
|
+
|
15
|
+
attr_reader :csv
|
16
|
+
|
17
|
+
def parsed_columns
|
18
|
+
samples(50).map do |name, values|
|
19
|
+
Column.new(name, values)
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
# Returns the first `num_rows` rows of data, transposed into a hash.
|
24
|
+
#
|
25
|
+
# For example, the following CSV data:
|
26
|
+
#
|
27
|
+
# foo,bar
|
28
|
+
# 1,2
|
29
|
+
# 3,4
|
30
|
+
#
|
31
|
+
# Would become:
|
32
|
+
#
|
33
|
+
# { "foo" => [1, 3], "bar" => [2, 4] }
|
34
|
+
#
|
35
|
+
# This gives us enough data to be able to guess the type of
|
36
|
+
# a column.
|
37
|
+
def samples(num_rows)
|
38
|
+
data =
|
39
|
+
File.read(csv)
|
40
|
+
.encode("UTF-8", invalid: :replace, undef: :replace, replace: "")
|
41
|
+
|
42
|
+
csv = CSV.new(data, headers: true)
|
43
|
+
sample_data = csv.each.take(num_rows)
|
44
|
+
csv.close
|
45
|
+
|
46
|
+
sample_data.map(&:to_a).flatten(1).each_with_object({}) do |row, data|
|
47
|
+
column = row[0]
|
48
|
+
value = row[1]
|
49
|
+
|
50
|
+
data[column] ||= []
|
51
|
+
data[column] << value
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,23 @@
|
|
1
|
+
module InCSV
|
2
|
+
module Types
|
3
|
+
class Currency < ColumnType
|
4
|
+
MATCH_EXPRESSION = /\A(\$|£)([0-9,\.]+)\z/
|
5
|
+
|
6
|
+
def self.for_database
|
7
|
+
"DECIMAL(10,2)"
|
8
|
+
end
|
9
|
+
|
10
|
+
def match?
|
11
|
+
value.strip.match(MATCH_EXPRESSION)
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.clean_value(value)
|
15
|
+
return unless value
|
16
|
+
|
17
|
+
value.strip.match(MATCH_EXPRESSION) do |match|
|
18
|
+
BigDecimal(match[2].delete(","))
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
data/lib/incsv/types.rb
ADDED
data/lib/incsv/version.rb
CHANGED
data/lib/incsv.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: incsv
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Rob Miller
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-02-
|
11
|
+
date: 2016-02-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -66,6 +66,20 @@ dependencies:
|
|
66
66
|
- - "~>"
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: 0.19.1
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: pry
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - "~>"
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '0.10'
|
76
|
+
type: :runtime
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - "~>"
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0.10'
|
69
83
|
- !ruby/object:Gem::Dependency
|
70
84
|
name: sqlite3
|
71
85
|
requirement: !ruby/object:Gem::Requirement
|
@@ -98,7 +112,8 @@ description: Loads a CSV file into an SQLite database automatically, dropping yo
|
|
98
112
|
into a Ruby shell that allows you to explore the data within.
|
99
113
|
email:
|
100
114
|
- rob@bigfish.co.uk
|
101
|
-
executables:
|
115
|
+
executables:
|
116
|
+
- incsv
|
102
117
|
extensions: []
|
103
118
|
extra_rdoc_files: []
|
104
119
|
files:
|
@@ -112,8 +127,17 @@ files:
|
|
112
127
|
- Rakefile
|
113
128
|
- bin/console
|
114
129
|
- bin/setup
|
130
|
+
- exe/incsv
|
115
131
|
- incsv.gemspec
|
116
132
|
- lib/incsv.rb
|
133
|
+
- lib/incsv/column.rb
|
134
|
+
- lib/incsv/column_type.rb
|
135
|
+
- lib/incsv/database.rb
|
136
|
+
- lib/incsv/schema.rb
|
137
|
+
- lib/incsv/types.rb
|
138
|
+
- lib/incsv/types/currency.rb
|
139
|
+
- lib/incsv/types/date.rb
|
140
|
+
- lib/incsv/types/string.rb
|
117
141
|
- lib/incsv/version.rb
|
118
142
|
homepage: https://github.com/robmiller/incsv
|
119
143
|
licenses:
|