tableschema 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (49) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +21 -0
  3. data/.travis.yml +15 -1
  4. data/README.md +164 -129
  5. data/Rakefile +10 -1
  6. data/bin/console +2 -6
  7. data/{etc/schemas → lib/profiles}/geojson.json +0 -1
  8. data/lib/profiles/table-schema.json +1625 -0
  9. data/lib/profiles/topojson.json +311 -0
  10. data/lib/tableschema.rb +5 -3
  11. data/lib/tableschema/constraints/constraints.rb +12 -24
  12. data/lib/tableschema/constraints/enum.rb +6 -2
  13. data/lib/tableschema/constraints/max_length.rb +6 -2
  14. data/lib/tableschema/constraints/maximum.rb +12 -2
  15. data/lib/tableschema/constraints/min_length.rb +6 -2
  16. data/lib/tableschema/constraints/minimum.rb +12 -2
  17. data/lib/tableschema/constraints/pattern.rb +9 -2
  18. data/lib/tableschema/constraints/required.rb +6 -15
  19. data/lib/tableschema/constraints/unique.rb +12 -0
  20. data/lib/tableschema/defaults.rb +9 -0
  21. data/lib/tableschema/exceptions.rb +15 -2
  22. data/lib/tableschema/field.rb +39 -20
  23. data/lib/tableschema/helpers.rb +32 -15
  24. data/lib/tableschema/infer.rb +31 -28
  25. data/lib/tableschema/model.rb +57 -34
  26. data/lib/tableschema/schema.rb +40 -6
  27. data/lib/tableschema/table.rb +75 -26
  28. data/lib/tableschema/types/any.rb +1 -0
  29. data/lib/tableschema/types/array.rb +2 -1
  30. data/lib/tableschema/types/base.rb +9 -21
  31. data/lib/tableschema/types/date.rb +1 -0
  32. data/lib/tableschema/types/datetime.rb +1 -0
  33. data/lib/tableschema/types/duration.rb +31 -0
  34. data/lib/tableschema/types/geojson.rb +27 -5
  35. data/lib/tableschema/types/geopoint.rb +4 -3
  36. data/lib/tableschema/types/integer.rb +1 -0
  37. data/lib/tableschema/types/number.rb +40 -25
  38. data/lib/tableschema/types/object.rb +2 -1
  39. data/lib/tableschema/types/string.rb +8 -0
  40. data/lib/tableschema/types/time.rb +1 -0
  41. data/lib/tableschema/types/year.rb +34 -0
  42. data/lib/tableschema/types/yearmonth.rb +52 -0
  43. data/lib/tableschema/validate.rb +45 -29
  44. data/lib/tableschema/version.rb +1 -1
  45. data/tableschema.gemspec +2 -1
  46. metadata +31 -12
  47. data/etc/schemas/json-table-schema.json +0 -102
  48. data/lib/tableschema/data.rb +0 -60
  49. data/lib/tableschema/types/null.rb +0 -37
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 846c3cf9cf67190ece4602c3ecc26124789d6bfe
4
- data.tar.gz: 6c2f2101cb63bad02b025281be54d85da91dcdcf
3
+ metadata.gz: be0ea32c71fc75dd1acca11a181b3fe8a7b69e33
4
+ data.tar.gz: e73959a568fd604b31fbe72376b9f2987095602c
5
5
  SHA512:
6
- metadata.gz: 90f9e6af27235f1cf4e8509c1133c33840a695c2cb000ca70ab1bff5bf8e8e0cea8aec311fc704a58fbb13ea9cab7b58fb4758d73fa0bbb8f79cd4ea4ecdb6b0
7
- data.tar.gz: abb5bbab98a9431458b4962ede074149e16da506d7366f9857dc4cc6b85f876a2fd7e0fa0990f8da71c7d81566d8f7c298e971cba55b9a611bea4c683fb12d3c
6
+ metadata.gz: 2795a5b5d62696987588e9dfd7970c5539b68385d8e20eba1f92249645a7a72d338c5a44e96514531e562f7fdec0a40b49c3fd095665a027a4763e21e43a0fb9
7
+ data.tar.gz: e145c763e64f6384cadf6d8b3fdfeae01b562c50a1243def427036f81d2f6f39c95dd22837cdddc0844e2e4287b2aae71d24b57101af63b62908090093abcf11
data/.rubocop.yml ADDED
@@ -0,0 +1,21 @@
1
+ AllCops:
2
+ DisabledByDefault: true
3
+ Exclude:
4
+ - 'lib/tableschema/exceptions.rb'
5
+
6
+ Security:
7
+ Enabled: true
8
+
9
+ Lint:
10
+ Enabled: true
11
+
12
+ Style/HashSyntax:
13
+ Enabled: true
14
+ EnforcedStyle: ruby19_no_mixed_keys
15
+
16
+ Style/MutableConstant:
17
+ Enabled: true
18
+
19
+ Metrics/CyclomaticComplexity:
20
+ Max: 10
21
+ Severity: error
data/.travis.yml CHANGED
@@ -1,9 +1,23 @@
1
1
  ---
2
2
  language: ruby
3
+
3
4
  rvm:
4
5
  - 2.3.1
5
6
  - 2.4.1
6
- before_install: gem install bundler -v 1.11.2
7
+
8
+ before_install:
9
+ gem install bundler -v 1.11.2
10
+
11
+ install:
12
+ - bundle
13
+ - gem install rubocop
14
+
15
+ script:
16
+ - rake spec
17
+
18
+ after_success:
19
+ - rubocop
20
+
7
21
  deploy:
8
22
  provider: rubygems
9
23
  api_key:
data/README.md CHANGED
@@ -33,24 +33,25 @@ Since version 0.3 the library was renamed `tableschema` and has a gem with the s
33
33
  The gem `jsontableschema` is no longer maintained. Here are the steps to transition your code to `tableschema`:
34
34
 
35
35
  1. Replace
36
- ```ruby
37
- gem 'jsontableschema'
38
- ```
39
- with
40
36
 
41
- ```ruby
42
- gem 'tableschema', '0.3.0'
43
- ```
37
+ ```ruby
38
+ gem 'jsontableschema'
39
+ ```
40
+ with
41
+
42
+ ```ruby
43
+ gem 'tableschema', '0.3.0'
44
+ ```
44
45
 
45
46
  2. Replace module name `JsonTableSchema` with module name `TableSchema`. For example:
46
47
 
47
- ```ruby
48
- JsonTableSchema::Table.infer_schema(csv)
49
- ```
50
- with
51
- ```ruby
52
- TableSchema::Table.infer_schema(csv)
53
- ```
48
+ ```ruby
49
+ JsonTableSchema::Table.infer_schema(csv)
50
+ ```
51
+ with
52
+ ```ruby
53
+ TableSchema::Table.infer_schema(csv)
54
+ ```
54
55
 
55
56
  ## Usage
56
57
 
@@ -60,27 +61,40 @@ Validate and cast data from a CSV as described by a schema.
60
61
 
61
62
  ```ruby
62
63
  schema = {
63
- "fields": [
64
+ fields: [
64
65
  {
65
- "name" => "id",
66
- "title" => "Identifier",
67
- "type" => "integer"
66
+ name: 'id',
67
+ title: 'Identifier',
68
+ type: 'integer'
68
69
  },
69
70
  {
70
- "name" => "title",
71
- "title" => "Title",
72
- "type" => "string"
71
+ name: 'title',
72
+ title: 'Title',
73
+ type: 'string'
73
74
  }
74
75
  ]
75
- } # Can also be a URL or a path
76
+ }
76
77
 
77
- csv = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtures/simple_data.csv' # Can also be a url or array of arrays
78
+ csv = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtures/simple_data.csv'
78
79
 
79
80
  table = TableSchema::Table.new(csv, schema)
80
- table.rows
81
+
82
+ # Iterate through rows
83
+ table.iter{ |row| print row }
84
+ # [1, "foo"]
85
+ # [2, "bar"]
86
+ # [3, "baz"]
87
+
88
+ # Read the entire CSV in memory
89
+ table.read
81
90
  #=> [[1,'foo'],[2,'bar'],[3,'baz']]
82
91
  ```
83
92
 
93
+ Both `iter` and `read` take the optional parameters:
94
+ - `row_limit`: integer, default `nil` - stop at this many rows
95
+ - `cast`: boolean, default `true` - cast values for each row
96
+ - `keyed`: boolean, default: `false` - return the rows as Hashes with headers as keys
97
+
84
98
  ### Infer a schema
85
99
 
86
100
  If you don't have a schema for a CSV, and want to generate one, you can infer a schema like so:
@@ -90,95 +104,69 @@ csv = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtur
90
104
 
91
105
  table = TableSchema::Table.infer_schema(csv)
92
106
  table.schema
93
- #=> {"fields"=>[{"name"=>"id", "title"=>"", "description"=>"", "type"=>"string", "format"=>"default"}, {"name"=>"title", "title"=>"", "description"=>"", "type"=>"string", "format"=>"default"}]}
107
+ #=> {:fields=>[{:name=>"id", :title=>"", :description=>"", :type=>"integer", :format=>"default", :constraints=>{}}, {:name=>"title", :title=>"", :description=>"", :type=>"string", :format=>"default", :constraints=>{}}]}
94
108
  ```
95
109
 
96
- ### Validate a schema
110
+ ### Build a Schema
97
111
 
98
- To validate that a schema meets the JSON Table Schema spec, you can pass a schema to the initializer like so:
112
+ You can also build a schema from scratch or modify an existing one:
99
113
 
100
114
  ```ruby
101
- schema_hash = {
102
- "fields" => [
103
- {
104
- "name" => "id"
105
- },
106
- {
107
- "name" => "height"
108
- }
109
- ]
110
- }
111
-
112
- schema = TableSchema::Schema.new(schema_hash)
113
- schema.valid?
114
- #=> true
115
+ schema = TableSchema::Schema.new({
116
+ fields: [],
117
+ })
118
+
119
+ # Add a field
120
+ schema.add_field({
121
+ name: 'id',
122
+ type: 'string',
123
+ constraints: {
124
+ required: true,
125
+ }
126
+ })
127
+
128
+ # Remove a field
129
+ schema.remove_field('id')
115
130
  ```
116
131
 
117
- You can also pass a file path or URL to the initializer:
132
+ `add_field` will ignore the updates if the updated version of the the schema fails [validation](#validate-a-schema).
133
+ If you wish to prevent an invalid schema from being created or updated by raising validation errors, you can pass the `strict: true` argument to the Schema initializer:
118
134
 
119
135
  ```ruby
120
- schema = TableSchema::Schema.new('http://example.org/schema.json')
121
- schema.valid?
122
- #=> true
136
+ schema = TableSchema::Schema.new(schema_hash, strict: true)
123
137
  ```
124
138
 
125
- If the schema is invalid, you can access the errors via the `messages` attribute
139
+ There are multiple methods to inspect a schema:
126
140
 
127
141
  ```ruby
128
142
  schema_hash = {
129
- "fields" => [
143
+ fields: [
130
144
  {
131
- "name"=>"id",
132
- "title"=>"Identifier",
133
- "type"=>"integer"
134
- },
135
- {
136
- "name"=>"title",
137
- "title"=>"Title",
138
- "type"=>"string"
139
- }
140
- ],
141
- "primaryKey"=>"identifier"
142
- }
143
-
144
- schema.valid?
145
- #=> false
146
- schema.messages
147
- #=> ["The JSON Table Schema primaryKey value `identifier` is not found in any of the schema's field names"]
148
- ```
149
-
150
- ## Schema Model
151
-
152
- You can also access the schema via a Ruby model, with some useful methods for interaction:
153
-
154
- ```ruby
155
- schema_hash = {
156
- "fields" => [
157
- {
158
- "name" => "id",
159
- "type" => "string",
160
- "constraints" => {
161
- "required" => true,
162
- }
145
+ name: 'id',
146
+ type: 'string',
147
+ constraints: {
148
+ required: true,
163
149
  },
164
- {
165
- "name" => "height",
166
- "type" => "number"
167
- }
150
+ },
151
+ {
152
+ name: 'height',
153
+ type: 'number',
154
+ },
155
+ {
156
+ name: 'state',
157
+ },
168
158
  ],
169
- "primaryKey" => "id",
170
- "foreignKeys" => [
159
+ primaryKey: 'id',
160
+ foreignKeys: [
171
161
  {
172
- "fields" => "state",
173
- "reference" => {
174
- "datapackage" => "http://data.okfn.org/data/mydatapackage/",
175
- "resource" => "the-resource",
176
- "fields" => "state_id"
177
- }
178
- }
162
+ fields: 'state',
163
+ reference: {
164
+ resource: 'the-resource',
165
+ fields: 'state_id',
166
+ },
167
+ },
179
168
  ]
180
169
  }
181
-
182
170
  schema = TableSchema::Schema.new(schema_hash)
183
171
 
184
172
  schema.headers
@@ -186,79 +174,126 @@ schema.headers
186
174
  schema.required_headers
187
175
  #=> ["id"]
188
176
  schema.fields
189
- #=> [{"name"=>"id", "constraints"=>{"required"=>true}, "type"=>"string", "format"=>"default"}, {"name"=>"height", "type"=>"number", "format"=>"default"}]
177
+ #=> [{:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}, {:name=>"height", :type=>"number", :format=>"default", :constraints=>{}}]
190
178
  schema.primary_keys
191
179
  #=> ["id"]
192
180
  schema.foreign_keys
193
- #=> [{"fields" => "state", "reference" => { "datapackage" => "http://data.okfn.org/data/mydatapackage/", "resource" => "the-resource", "fields" => "state_id" } } ]
181
+ # => [{:fields=>"state", :reference=>{:resource=>"the-resource", :fields=>"state_id"}}]
194
182
  schema.get_field('id')
195
- #=> {"name"=>"id", "constraints"=>{"required"=>true}, "type"=>"string", "format"=>"default"}
183
+ # => {:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}
196
184
  schema.has_field?('foo')
197
185
  #=> false
198
186
  schema.get_type('id')
199
187
  #=> 'string'
200
188
  schema.get_fields_by_type('string')
201
- #=> [{"name"=>"id", "constraints"=>{"required"=>true}, "type"=>"string", "format"=>"default"}, {"name"=>"height", "type"=>"string", "format"=>"default"}]
189
+ # => [{:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}, {:name=>"state", :type=>"string", :format=>"default", :constraints=>{}}]
202
190
  schema.get_constraints('id')
203
- #=> {"required" => true}
204
- schema.cast_row(['string', '10.0'])
205
- #=> ['string', 10.0]
206
- schema.cast([['foo', '12.0'],['bar', '10.0']])
207
- #=> [['foo', 12.0],['bar', 10.0]]
191
+ # => {:required=>true}
192
+ ```
193
+
194
+ #### Cast row
195
+
196
+ To check if a given set of values complies with the schema, you can use `cast_row`:
197
+
198
+ ```
199
+ schema.cast_row(['string', '10.0', 'State'])
200
+ #=> ['string', 10.0, 'State']
201
+ ```
202
+
203
+ By default the converter will fail on the first error it finds. However, by passing `fail_fast: false` as the second argument the errors will be collected into an `exception.errors` attribute for you to review later. For example:
204
+
205
+ ```ruby
206
+ row = [3, 'nan', 'State']
207
+
208
+ schema.cast_row(row)
209
+ #=> TableSchema::InvalidCast: 3 is not a string
210
+ begin
211
+ schema.cast_row(row, fail_fast: false)
212
+ rescue TableSchema::MultipleInvalid => exception
213
+ exception.errors
214
+ end
215
+ #=> #<Set: {#<TableSchema::InvalidCast: 3 is not a string>,
216
+ #<TableSchema::InvalidCast: nan is not a number>}>
208
217
  ```
209
218
 
210
- When casting a row (using `cast_row`), or a number of rows (using `cast`), by default the converter will fail on the first error it finds. If you pass `false` as the second argument, the errors will be collected into a `errors` attribute for you to review later. For example:
219
+ ### Validate a schema
220
+
221
+ To make sure a schema complies with [Table Schema spec](https://specs.frictionlessdata.io/table-schema), we validate each custom schema against the
222
+ official [Table Schema schema](https://specs.frictionlessdata.io/schemas/table-schema.json):
211
223
 
212
224
  ```ruby
213
225
  schema_hash = {
214
- "fields" => [
215
- {
216
- "name" => "id",
217
- "type" => "string",
218
- "constraints" => {
219
- "required" => true,
220
- }
221
- },
222
- {
223
- "name" => "height",
224
- "type" => "number"
225
- }
226
+ fields: [
227
+ { name: 'id' },
226
228
  ]
227
229
  }
228
-
229
230
  schema = TableSchema::Schema.new(schema_hash)
231
+ schema.validate
232
+ #=> true
233
+ ```
230
234
 
231
- rows = [
232
- ['foo', 'notanumber'],
233
- ['bar', 'notanumber'],
234
- ['wrong column count']
235
- ]
235
+ If the schema is invalid, you can access the errors via the `errors` attribute
236
236
 
237
- schema.cast(rows)
238
- #=> TableSchema::InvalidCast: notanumber is not a number
239
- schema.cast(rows, false)
240
- #=> TableSchema::MultipleInvalid
237
+ ```ruby
238
+ schema_hash = {
239
+ fields: [
240
+ {
241
+ name: 'id',
242
+ title: 'Identifier',
243
+ type: 'integer'
244
+ },
245
+ {
246
+ name: 'title',
247
+ title: 'Title',
248
+ type: 'string'
249
+ }
250
+ ],
251
+ primaryKey: 'identifier'
252
+ }
253
+
254
+ schema = TableSchema::Schema.new(schema_hash)
255
+ schema.validate
256
+ #=> false
241
257
  schema.errors
242
- #=> [#<TableSchema::InvalidCast: notanumber is not a number>, #<TableSchema::InvalidCast: notanumber is not a number>, #<TableSchema::ConversionError: The number of items to convert (1) does not match the number of headers in the schema (2)>]
258
+ #=> #<Set: {"The TableSchema primaryKey value `identifier` is not found in any of the schema's field names"}>
259
+
260
+ # Raise error if validation fails
261
+ schema.validate!
262
+ #=> TableSchema::SchemaException: The TableSchema primaryKey value `identifier` is not found in any of the schema's field names
243
263
  ```
244
264
 
245
265
  ## Field
246
266
 
267
+ Data values can be cast to native Ruby objects with a Field instance. This allows formats and constraints to be defined for the field in the [field descriptor](https://specs.frictionlessdata.io/table-schema/#field-descriptors):
268
+
247
269
  ```ruby
248
270
  # Init field
249
- field = TableSchema::Field.new({'type': 'number'})
271
+ field = TableSchema::Field.new({
272
+ name: 'over_1700',
273
+ type: 'number',
274
+ constraints: {
275
+ minimum: '1700',
276
+ },
277
+ })
250
278
 
251
279
  # Cast a value
252
280
  field.cast_value('12345')
253
281
  #=> 12345.0
254
282
  ```
255
283
 
256
- Data values can be cast to native Ruby objects with a Field instance. Type instances can be initialized with f[ield descriptors](http://dataprotocols.org/json-table-schema/#field-descriptors). This allows formats and constraints to be defined.
284
+ Casting a value will check the value is of the expected `type`, is in the correct `format`, and complies with any `constraints` imposed in the descriptor.
257
285
 
258
- Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema. E.g. a date value (in ISO 8601 format) can be cast with a DateType instance. Values that can't be cast will raise an `InvalidCast` exception.
286
+ Value that can't be cast will raise an `InvalidCast` exception.
259
287
 
260
288
  Casting a value that doesn't meet the constraints will raise a `ConstraintError` exception.
261
289
 
290
+ ```ruby
291
+ field.cast_value('nan')
292
+ #=> TableSchema::InvalidCast: nan is not a number
293
+ field.cast_value('1200')
294
+ #=> TableSchema::ConstraintError: The field `over_1700` must not be less than 1700
295
+ ```
296
+
262
297
  ## Development
263
298
 
264
299
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.