rstore 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2011 Stefan Rohlfing
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,253 @@
1
+ # RStore
2
+
3
+ ### A library for easy batch storage of csv data into a database
4
+
5
+ Uses the CSV standard library for parsing, *Nokogiri* for URL handling, and *Sequel* ORM for database management.
6
+
7
+ ## Special Features
8
+
9
+ * **Batch processing** of csv files
10
+ * Fetches data from different sources: **files, directories, URLs**
11
+ * **Customizable** using additional options (also see section *Available Options*)
12
+ * **Validation of field values**. At the moment validation of the following types is supported:
13
+ * `String`, `Integer`, `Float`, `Date`, `DateTime`, `Time`, and `Boolean`
14
+ * **Descriptive error messages** pointing helping you to find any invalid data quickly.
15
+ * Only define your database and table classes once, then just `require` them when needed.
16
+ * **Safe and transparent data storage**:
17
+ * Using database transactions: Either the data from all all files is stored or none (also see section *Database Requirements*)
18
+ * To avoid double entry of data, the `run` method can only be run once on a single instance of `RStore::CSV`.
19
+
20
+
21
+ ## Database Requirements
22
+
23
+ 1. Expects the database table to have an addition column storing an auto-incrementing primary key.
24
+ 2. **Requires the database to support transactions**:
25
+ Most other database platforms support transactions natively.
26
+ In MySQL, you'll need to be running `InnoDB` or `BDB` table types rather than the more common `MyISAM`.
27
+ If you are using MySQL and the table has not been created yet, RStore::CSV will take care of using the
28
+ correct table type upon creation.
29
+
30
+
31
+ ## Installation
32
+
33
+ ``` batch
34
+ gem install 'rstore'
35
+ ```
36
+
37
+ ## Public API Documentation
38
+
39
+ A detailed documentation of the public API can be found [here](http://rubydoc.info/github/bytesource/rstore).
40
+
41
+
42
+ ## Sample Usage
43
+
44
+ Sample csv file
45
+
46
+ > "product","quantity","price","created_at","min_demand","max_demand","on_stock"
47
+ > "toy1","1","1.12","2011-2-4","1:30","1:30am","true"
48
+ > "toy2","2","2.22","2012/2/4","2:30","2:30pm","false
49
+ > "toy3","3","3.33","2013/2/4","3:30","3:30 a.m.","True
50
+ > "toy4","4",,,"4:30","4:30 p.m.","False"
51
+ > "toy4","5","5.55","2015-2-4","5:30","5:30AM","1"
52
+ > "toy5","6","6.66","2016/2/4","6:30","6:30 P.M.","0"
53
+ > "toy6","7","7.77",,,,"false"
54
+
55
+
56
+ 1) Load gem
57
+
58
+ ``` ruby
59
+
60
+ require 'rstore/csv'
61
+
62
+ ```
63
+ 2) Store database information in a subclass of `RStore::BaseDB`
64
+ Naming convention: name => NameDB
65
+
66
+ ``` ruby
67
+
68
+ class CompanyDB < RStore::BaseDB
69
+
70
+ # Same as Sequel.connect, except that you don't need to
71
+ # provide the :database key.
72
+ info(:adapter => 'mysql',
73
+ :host => 'localhost',
74
+ :user => 'root',
75
+ :password => 'xxx')
76
+
77
+ end
78
+
79
+ ```
80
+
81
+ 3) Store table information in a subclass of `RStore::BaseTable`
82
+ Naming convention: name => NameTable
83
+
84
+ ``` ruby
85
+
86
+ class ProductsTable < RStore::BaseTable
87
+
88
+ # Specify the database table the same way
89
+ # you do in Sequel
90
+ create do
91
+ primary_key :id, :allow_null => false
92
+ String :product
93
+ Integer :quantity
94
+ Float :price
95
+ Date :created_at
96
+ DateTime :min_demand
97
+ Time :max_demand
98
+ Boolean :on_stock, :allow_null => false, :default => false
99
+ end
100
+
101
+ end
102
+
103
+ ```
104
+
105
+ **Note**:
106
+ You can put the database and table class definitions in separate files
107
+ and `require` them when needed.
108
+
109
+
110
+ 4) Enter csv data into the database
111
+ The `from` method accepts a path to a file or directory as well as an URL.
112
+ The `to` metthod accepts a string of the form *db_name.table_name*
113
+
114
+ ```ruby
115
+ RStore::CSV.new do
116
+ from '../easter/children', :recursive => true # select a directory or
117
+ from '../christmas/children/toys.csv' # file, or
118
+ from 'www.example.com/sweets.csv', :selector => 'pre div.line' # URL
119
+ to 'company.products' # provide database and table name
120
+ run # run the program
121
+ end
122
+
123
+ ```
124
+ ### Additional Features
125
+ ---
126
+
127
+ You can change and reset the default options (see section *Available Options* below for details)
128
+
129
+ ``` ruby
130
+ # Search directories recursively and handle the first row of a file as data by default
131
+ RStore::CSV.change_default_options(:recursive => true, :has_headers => false)
132
+
133
+ RStore::CSV.new do
134
+ from 'dir1'
135
+ from 'dir2'
136
+ from 'dir3'
137
+ to 'company.products'
138
+ run
139
+ end
140
+
141
+ # Restore default options
142
+ RStore::CSV.reset_default_options
143
+
144
+ ```
145
+
146
+ There is also a convenience method enabling you to use
147
+ all of [Sequels query methods](http://sequel.rubyforge.org/rdoc/files/doc/querying_rdoc.html).
148
+
149
+ ``` ruby
150
+ RStore::CSV.query('company.products') do |table| # table = Sequel::Dataset object
151
+ table.all # fetch everything
152
+ table.all[3] # fetch row number 4 (see output below)
153
+ table.filter(:id => 2).update(:on_stock => true) # update entry
154
+ table.filter(:id => 3).delete # delete entry
155
+ end
156
+
157
+ ```
158
+
159
+ *)
160
+ Output of `db[table.name].all[3]`
161
+
162
+ ``` ruby
163
+ # {:product => "toy4",
164
+ # :quantity => 4,
165
+ # :price => nil,
166
+ # :created_at => nil,
167
+ # :min_demand => <DateTime: 2011-10-25T04:30:00+00:00 (39293755/16,0/1,2299161)>,
168
+ # :max_demand => <DateTime: 2011-10-25T16:30:00+00:00 (39293763/16,0/1,2299161)>,
169
+ # :on_stock => false}
170
+
171
+ ```
172
+
173
+ Access all of Sequels functionality by using the convenience methods
174
+ `BaseDB.connect`, `BaseTable.name`, and `BaseTable.table_info`:
175
+
176
+ ``` ruby
177
+
178
+ DB = CompanyDB.connect # Open connection to 'company' database
179
+ name = ProductTable.name # Table name, :products, used as an argument to the following methods.
180
+ layout = ProductsTable.table_info # The Proc that was passed to ProductsTable.create
181
+
182
+ DB.create_table(name, &layout) # Create table
183
+
184
+ DB.alter_table name do # Alter table
185
+ drop_column :created_at
186
+ add_column :entry_date, :date
187
+ end
188
+
189
+ DB.drop_table(name) # Drop table
190
+
191
+ ```
192
+
193
+
194
+ ## Available Options
195
+
196
+ The method `from` accepts two kinds of options, file options and parse options:
197
+
198
+ ### File Options
199
+ File options are used for fetching csv data from a source. The following options are recognized:
200
+
201
+ * **:has_headers**, default: `true`
202
+ * When set to false, the first line of a file is processed as data, otherwise it is discarded.
203
+ * **:recursive**, default: `false`
204
+ * When set to true and a directory is given, recursively search for files. Non-csv files are skipped.
205
+ * **:selector**, default: `""`
206
+ * Mandatory css selector with an URL. For more details please see the section *Further Reading* below
207
+
208
+
209
+ ### Parse Options
210
+ Parse options are arguments to `CSV::parse`. The following options are recognized:
211
+
212
+ * **:col_sep**, default: `","`
213
+ * The String placed between each field.
214
+ * **:row_sep**, default: `:auto`
215
+ * The String appended to the end of each row.
216
+ * **:quote_char**, default: `'"'`
217
+ * The character used to quote fields.
218
+ * **:field_size_limit**, default: `nil`
219
+ * The maximum size CSV will read ahead looking for the closing quote for a field.
220
+ * **:skip_blanks**, default: `false`
221
+ * When set to a true value, CSV will skip over any rows with no content.
222
+
223
+ For more information on the parse options, please see section *Further Reading* below.
224
+
225
+
226
+ ## Further Reading
227
+
228
+ * Sequel
229
+ * [Cheat sheet][sequel_cheat]
230
+ * [Connecting to a database][sequel_connect]
231
+ * [Querying][sequel_query]
232
+ * CSV
233
+ * [Parse options documentation][csv_options]
234
+ * [Common Format and MIME Type][csv_standard] for Comma-Separated Values (CSV) Files
235
+ * Nokogiri
236
+ * [Project Site][nokogiri_home]
237
+
238
+ [sequel_cheat]: http://sequel.rubyforge.org/rdoc/files/doc/cheat_sheet_rdoc.html
239
+ [sequel_connect]: http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html
240
+ [sequel_query]: http://sequel.rubyforge.org/rdoc/files/doc/querying_rdoc.html
241
+ [csv_options]: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-new
242
+ [csv_standard]: http://www.ietf.org/rfc/rfc4180.txt
243
+ [nokogiri_home]: http://nokogiri.org/
244
+
245
+
246
+ ## Feedback
247
+
248
+ Any suggestions or criticism are highly welcome! Whatever your feedback, it will help me make this gem better!
249
+
250
+
251
+
252
+
253
+
data/Rakefile ADDED
@@ -0,0 +1,40 @@
1
+ # require 'spec/rake/spectask' # depreciated
2
+ require 'rspec/core/rake_task'
3
+ require 'rake/gempackagetask'
4
+ require 'rdoc/task'
5
+
6
+ # Build gem: rake gem
7
+ # Push gem: rake push
8
+
9
+ task :default => [ :spec, :gem ]
10
+
11
+ RSpec::Core::RakeTask.new do
12
+
13
+ message = <<-MESSAGE
14
+
15
+ ====================================
16
+ | NOTE:
17
+ | Make sure to provide the correct connection info for database 'PlastronicsDB'
18
+ | in file 'csv_spec.rb' (line 12).
19
+ ====================================
20
+
21
+ MESSAGE
22
+
23
+ puts message
24
+
25
+ :spec
26
+ end
27
+
28
+ gem_spec = eval(File.read('rstore.gemspec'))
29
+
30
+ Rake::GemPackageTask.new( gem_spec ) do |t|
31
+ t.need_zip = true
32
+ end
33
+
34
+ #RDoc::Task.new do |rdoc|
35
+ #
36
+ #end
37
+
38
+ task :push => :gem do |t|
39
+ sh "gem push pkg/#{gem_spec.name}-#{gem_spec.version}.gem"
40
+ end
data/lib/rstore.rb ADDED
@@ -0,0 +1,5 @@
1
+ # encoding: utf-8
2
+
3
+ module RStore
4
+ require 'rstore/csv'
5
+ end
@@ -0,0 +1,119 @@
1
+ # encoding: utf-8
2
+
3
+ require 'sequel'
4
+
5
+ module RStore
6
+ class BaseDB
7
+
8
+ class << self
9
+ # A Hash holding subclasses of {RStore::BaseDB}
10
+ # @return [Hash{Symbol=>BaseDB}] All subclasses of {RStore::BaseDB} defined in the current namespace.
11
+ # Subclasses are added automatically via _self.inherited_.
12
+ # @example
13
+ # class CompanyDB < RStore::BaseDB
14
+ # info #...
15
+ # end
16
+ #
17
+ # class MyDB < RStore::BaseDB
18
+ # info #...
19
+ # end
20
+ #
21
+ # RStore::BaseDB.db_classes
22
+ # #=> {:company=>companyDB, :my=>MyDB}
23
+ attr_reader :db_classes # self = #<Class:RStore::BaseDB>
24
+ end
25
+
26
+ @db_classes = Hash.new # self = RStore::BaseDB
27
+
28
+
29
+ def self.inherited subclass
30
+ BaseDB.db_classes[subclass.name] = subclass
31
+ end
32
+
33
+
34
+ # Define the database connection.
35
+ # @note To be called when defining a _subclass_ of {RStore::BaseDB}
36
+ # Accepts the same _one_ _arity_ parameters as [Sequel.connect](http://sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html)
37
+ # @overload info(options)
38
+ # @param [String, Hash] connection_info Either a connection string such as _postgres://user:password@localhost/blog_, or a `Hash` with the following options:
39
+ # @option options [String] :adapter The SQL database used, such as _mysql_ or _postgres_
40
+ # @option options [String] :host Example: 'localhost'
41
+ # @option options [String] :user
42
+ # @option options [String] :password
43
+ # @option options [String] :database The database name. You don't need to provide this option, as its value will be inferred from the class name.
44
+ # @return [void]
45
+ # @example
46
+ # # Using a connection string
47
+ # class CompanyDB < RStore::BaseDB
48
+ # info('postgres://user:password@localhost/blog')
49
+ # end
50
+ #
51
+ # # Using an options hash
52
+ # class CompanyDB < RStore::BaseDB
53
+ # info(adapter: 'mysql',
54
+ # host: 'localhost',
55
+ # user: 'root',
56
+ # password: 'xxx')
57
+ # end
58
+ def self.info hash_or_string
59
+ class << self # self = #<Class:CompanyDB>
60
+ attr_reader :connection_info
61
+ end
62
+ # self = CompanyDB
63
+ @connection_info = hash_or_string.is_a?(Hash) ? hash_or_string.merge(:database => self.name.to_s): hash_or_string
64
+ end
65
+
66
+
67
+ # Uses the connection info from {.info} to connect to the database.
68
+ # @note To be called when defining a _subclass_ of {RStore::BaseDB}
69
+ # @yieldparam [Sequel::Database] db The opened Sequel {http://sequel.rubyforge.org/rdoc/classes/Sequel/Database.html Database} object, which is closed when the block exits.
70
+ # @return [void]
71
+ # @example
72
+ # class CompanyDB < RStore::BaseDB
73
+ # info(adapter: 'mysql',
74
+ # host: 'localhost',
75
+ # user: 'root',
76
+ # password: 'xxx')
77
+ # end
78
+ #
79
+ # class DataTable < RStore::BaseTable
80
+ # create do
81
+ # primary_key :id, :allow_null => false
82
+ # String :col1
83
+ # Integer :col2
84
+ # Float :col3
85
+ # end
86
+ # end
87
+ #
88
+ # name = DataTable.name
89
+ #
90
+ # #Either
91
+ # DB = CompanyDB.connect
92
+ # DB.drop_table(name)
93
+ # DB.disconnect
94
+ #
95
+ # #Or
96
+ # CompanyDB.connect do |db|
97
+ # db.drop_table(name)
98
+ # end
99
+ def self.connect &block
100
+ Sequel.connect(@connection_info, &block)
101
+ end
102
+
103
+
104
+
105
+ # @return [Symbol] The lower-case class name without the _DB_ postfix
106
+ # @example
107
+ # class CompanyDB < RStore::BaseDB
108
+ # info('postgres://user:password@localhost/blog')
109
+ # end
110
+ #
111
+ # CompanyDB.name
112
+ # #=> :company
113
+ def self.name
114
+ super.gsub!(/DB/,'').downcase.to_sym
115
+ end
116
+
117
+ end
118
+ end
119
+
@@ -0,0 +1,92 @@
1
+ # encoding: utf-8
2
+
3
+ module RStore
4
+ class BaseTable
5
+
6
+ class << self
7
+ # A Hash holding subclasses of {RStore::BaseTable}
8
+ # @return [Hash{Symbol=>BaseTable}] All subclasses of {RStore::BaseTable} defined in the current namespace.
9
+ # Subclasses are added automatically via _self.inherited_.
10
+ # @example
11
+ # class ProductsTable < RStore::BaseTable
12
+ # create #...
13
+ # end
14
+ #
15
+ # class DataTable < RStore::BaseTable
16
+ # create #...
17
+ # end
18
+ #
19
+ # RStore::BaseTable.table_classes
20
+ # #=> {:products=>ProductsTable, :data=>DataTable}
21
+ attr_reader :table_classes
22
+ end
23
+
24
+ @table_classes = Hash.new
25
+
26
+ def self.inherited subclass
27
+ BaseTable.table_classes[subclass.name] = subclass
28
+ end
29
+
30
+
31
+ # Define the table layout.
32
+ # @note To be called when defining of a _subclass_ of {RStore::BaseTable}
33
+ # @note You need to define an extra column for an auto-incrementing primary key.
34
+ # @yield The same block as {http://sequel.rubyforge.org/rdoc/classes/Sequel/Database.html#method-i-create_table Sequel::Database.create_table}.
35
+ # @return [void]
36
+ # @example
37
+ # class ProductsTable < RStore::BaseTable
38
+ #
39
+ # create do
40
+ # primary_key :id, :allow_null => false
41
+ # String :product
42
+ # Integer :quantity
43
+ # Float :price
44
+ # Date :created_at
45
+ # DateTime :min_demand
46
+ # Time :max_demand
47
+ # Boolean :on_stock, :allow_null => false, :default => false
48
+ # end
49
+ # end
50
+ def self.create &block
51
+ class << self
52
+ attr_reader :table_info
53
+ end
54
+
55
+ @table_info = block
56
+ end
57
+
58
+
59
+ # @return [Symbol] The lower-case class name without the _DB_ postfix
60
+ # @example
61
+ # class CompanyDB < RStore::BaseDB
62
+ # info('postgres://user:password@localhost/blog')
63
+ # end
64
+ #
65
+ # CompanyDB.name
66
+ # #=> :company
67
+
68
+
69
+ # @return [Symbol] The lower-case class name without the _Table_ postfix
70
+ # @example
71
+ # class ProductsTable < RStore::BaseTable
72
+ #
73
+ # create do
74
+ # primary_key :id, :allow_null => false
75
+ # String :product
76
+ # Integer :quantity
77
+ # Float :price
78
+ # Date :created_at
79
+ # DateTime :min_demand
80
+ # Time :max_demand
81
+ # Boolean :on_stock, :allow_null => false, :default => false
82
+ # end
83
+ # end
84
+ #
85
+ # ProductsTable.name
86
+ # #=> :products
87
+ def self.name
88
+ super.gsub!(/Table/,'').downcase.to_sym
89
+ end
90
+ end
91
+ end
92
+