tableschema 0.3.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +21 -0
  3. data/.travis.yml +15 -1
  4. data/README.md +164 -129
  5. data/Rakefile +10 -1
  6. data/bin/console +2 -6
  7. data/{etc/schemas → lib/profiles}/geojson.json +0 -1
  8. data/lib/profiles/table-schema.json +1625 -0
  9. data/lib/profiles/topojson.json +311 -0
  10. data/lib/tableschema.rb +5 -3
  11. data/lib/tableschema/constraints/constraints.rb +12 -24
  12. data/lib/tableschema/constraints/enum.rb +6 -2
  13. data/lib/tableschema/constraints/max_length.rb +6 -2
  14. data/lib/tableschema/constraints/maximum.rb +12 -2
  15. data/lib/tableschema/constraints/min_length.rb +6 -2
  16. data/lib/tableschema/constraints/minimum.rb +12 -2
  17. data/lib/tableschema/constraints/pattern.rb +9 -2
  18. data/lib/tableschema/constraints/required.rb +6 -15
  19. data/lib/tableschema/constraints/unique.rb +12 -0
  20. data/lib/tableschema/defaults.rb +9 -0
  21. data/lib/tableschema/exceptions.rb +15 -2
  22. data/lib/tableschema/field.rb +39 -20
  23. data/lib/tableschema/helpers.rb +32 -15
  24. data/lib/tableschema/infer.rb +31 -28
  25. data/lib/tableschema/model.rb +57 -34
  26. data/lib/tableschema/schema.rb +40 -6
  27. data/lib/tableschema/table.rb +75 -26
  28. data/lib/tableschema/types/any.rb +1 -0
  29. data/lib/tableschema/types/array.rb +2 -1
  30. data/lib/tableschema/types/base.rb +9 -21
  31. data/lib/tableschema/types/date.rb +1 -0
  32. data/lib/tableschema/types/datetime.rb +1 -0
  33. data/lib/tableschema/types/duration.rb +31 -0
  34. data/lib/tableschema/types/geojson.rb +27 -5
  35. data/lib/tableschema/types/geopoint.rb +4 -3
  36. data/lib/tableschema/types/integer.rb +1 -0
  37. data/lib/tableschema/types/number.rb +40 -25
  38. data/lib/tableschema/types/object.rb +2 -1
  39. data/lib/tableschema/types/string.rb +8 -0
  40. data/lib/tableschema/types/time.rb +1 -0
  41. data/lib/tableschema/types/year.rb +34 -0
  42. data/lib/tableschema/types/yearmonth.rb +52 -0
  43. data/lib/tableschema/validate.rb +45 -29
  44. data/lib/tableschema/version.rb +1 -1
  45. data/tableschema.gemspec +2 -1
  46. metadata +31 -12
  47. data/etc/schemas/json-table-schema.json +0 -102
  48. data/lib/tableschema/data.rb +0 -60
  49. data/lib/tableschema/types/null.rb +0 -37
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 846c3cf9cf67190ece4602c3ecc26124789d6bfe
4
- data.tar.gz: 6c2f2101cb63bad02b025281be54d85da91dcdcf
3
+ metadata.gz: be0ea32c71fc75dd1acca11a181b3fe8a7b69e33
4
+ data.tar.gz: e73959a568fd604b31fbe72376b9f2987095602c
5
5
  SHA512:
6
- metadata.gz: 90f9e6af27235f1cf4e8509c1133c33840a695c2cb000ca70ab1bff5bf8e8e0cea8aec311fc704a58fbb13ea9cab7b58fb4758d73fa0bbb8f79cd4ea4ecdb6b0
7
- data.tar.gz: abb5bbab98a9431458b4962ede074149e16da506d7366f9857dc4cc6b85f876a2fd7e0fa0990f8da71c7d81566d8f7c298e971cba55b9a611bea4c683fb12d3c
6
+ metadata.gz: 2795a5b5d62696987588e9dfd7970c5539b68385d8e20eba1f92249645a7a72d338c5a44e96514531e562f7fdec0a40b49c3fd095665a027a4763e21e43a0fb9
7
+ data.tar.gz: e145c763e64f6384cadf6d8b3fdfeae01b562c50a1243def427036f81d2f6f39c95dd22837cdddc0844e2e4287b2aae71d24b57101af63b62908090093abcf11
data/.rubocop.yml ADDED
@@ -0,0 +1,21 @@
1
+ AllCops:
2
+ DisabledByDefault: true
3
+ Exclude:
4
+ - 'lib/tableschema/exceptions.rb'
5
+
6
+ Security:
7
+ Enabled: true
8
+
9
+ Lint:
10
+ Enabled: true
11
+
12
+ Style/HashSyntax:
13
+ Enabled: true
14
+ EnforcedStyle: ruby19_no_mixed_keys
15
+
16
+ Style/MutableConstant:
17
+ Enabled: true
18
+
19
+ Metrics/CyclomaticComplexity:
20
+ Max: 10
21
+ Severity: error
data/.travis.yml CHANGED
@@ -1,9 +1,23 @@
1
1
  ---
2
2
  language: ruby
3
+
3
4
  rvm:
4
5
  - 2.3.1
5
6
  - 2.4.1
6
- before_install: gem install bundler -v 1.11.2
7
+
8
+ before_install:
9
+ gem install bundler -v 1.11.2
10
+
11
+ install:
12
+ - bundle
13
+ - gem install rubocop
14
+
15
+ script:
16
+ - rake spec
17
+
18
+ after_success:
19
+ - rubocop
20
+
7
21
  deploy:
8
22
  provider: rubygems
9
23
  api_key:
data/README.md CHANGED
@@ -33,24 +33,25 @@ Since version 0.3 the library was renamed `tableschema` and has a gem with the s
33
33
  The gem `jsontableschema` is no longer maintained. Here are the steps to transition your code to `tableschema`:
34
34
 
35
35
  1. Replace
36
- ```ruby
37
- gem 'jsontableschema'
38
- ```
39
- with
40
36
 
41
- ```ruby
42
- gem 'tableschema', '0.3.0'
43
- ```
37
+ ```ruby
38
+ gem 'jsontableschema'
39
+ ```
40
+ with
41
+
42
+ ```ruby
43
+ gem 'tableschema', '0.3.0'
44
+ ```
44
45
 
45
46
  2. Replace module name `JsonTableSchema` with module name `TableSchema`. For example:
46
47
 
47
- ```ruby
48
- JsonTableSchema::Table.infer_schema(csv)
49
- ```
50
- with
51
- ```ruby
52
- TableSchema::Table.infer_schema(csv)
53
- ```
48
+ ```ruby
49
+ JsonTableSchema::Table.infer_schema(csv)
50
+ ```
51
+ with
52
+ ```ruby
53
+ TableSchema::Table.infer_schema(csv)
54
+ ```
54
55
 
55
56
  ## Usage
56
57
 
@@ -60,27 +61,40 @@ Validate and cast data from a CSV as described by a schema.
60
61
 
61
62
  ```ruby
62
63
  schema = {
63
- "fields": [
64
+ fields: [
64
65
  {
65
- "name" => "id",
66
- "title" => "Identifier",
67
- "type" => "integer"
66
+ name: 'id',
67
+ title: 'Identifier',
68
+ type: 'integer'
68
69
  },
69
70
  {
70
- "name" => "title",
71
- "title" => "Title",
72
- "type" => "string"
71
+ name: 'title',
72
+ title: 'Title',
73
+ type: 'string'
73
74
  }
74
75
  ]
75
- } # Can also be a URL or a path
76
+ }
76
77
 
77
- csv = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtures/simple_data.csv' # Can also be a url or array of arrays
78
+ csv = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtures/simple_data.csv'
78
79
 
79
80
  table = TableSchema::Table.new(csv, schema)
80
- table.rows
81
+
82
+ # Iterate through rows
83
+ table.iter{ |row| print row }
84
+ # [1, "foo"]
85
+ # [2, "bar"]
86
+ # [3, "baz"]
87
+
88
+ # Read the entire CSV in memory
89
+ table.read
81
90
  #=> [[1,'foo'],[2,'bar'],[3,'baz']]
82
91
  ```
83
92
 
93
+ Both `iter` and `read` take the optional parameters:
94
+ - `row_limit`: integer, default `nil` - stop at this many rows
95
+ - `cast`: boolean, default `true` - cast values for each row
96
+ - `keyed`: boolean, default: `false` - return the rows as Hashes with headers as keys
97
+
84
98
  ### Infer a schema
85
99
 
86
100
  If you don't have a schema for a CSV, and want to generate one, you can infer a schema like so:
@@ -90,95 +104,69 @@ csv = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtur
90
104
 
91
105
  table = TableSchema::Table.infer_schema(csv)
92
106
  table.schema
93
- #=> {"fields"=>[{"name"=>"id", "title"=>"", "description"=>"", "type"=>"string", "format"=>"default"}, {"name"=>"title", "title"=>"", "description"=>"", "type"=>"string", "format"=>"default"}]}
107
+ #=> {:fields=>[{:name=>"id", :title=>"", :description=>"", :type=>"integer", :format=>"default", :constraints=>{}}, {:name=>"title", :title=>"", :description=>"", :type=>"string", :format=>"default", :constraints=>{}}]}
94
108
  ```
95
109
 
96
- ### Validate a schema
110
+ ### Build a Schema
97
111
 
98
- To validate that a schema meets the JSON Table Schema spec, you can pass a schema to the initializer like so:
112
+ You can also build a schema from scratch or modify an existing one:
99
113
 
100
114
  ```ruby
101
- schema_hash = {
102
- "fields" => [
103
- {
104
- "name" => "id"
105
- },
106
- {
107
- "name" => "height"
108
- }
109
- ]
110
- }
111
-
112
- schema = TableSchema::Schema.new(schema_hash)
113
- schema.valid?
114
- #=> true
115
+ schema = TableSchema::Schema.new({
116
+ fields: [],
117
+ })
118
+
119
+ # Add a field
120
+ schema.add_field({
121
+ name: 'id',
122
+ type: 'string',
123
+ constraints: {
124
+ required: true,
125
+ }
126
+ })
127
+
128
+ # Remove a field
129
+ schema.remove_field('id')
115
130
  ```
116
131
 
117
- You can also pass a file path or URL to the initializer:
132
+ `add_field` will ignore the updates if the updated version of the the schema fails [validation](#validate-a-schema).
133
+ If you wish to prevent an invalid schema from being created or updated by raising validation errors, you can pass the `strict: true` argument to the Schema initializer:
118
134
 
119
135
  ```ruby
120
- schema = TableSchema::Schema.new('http://example.org/schema.json')
121
- schema.valid?
122
- #=> true
136
+ schema = TableSchema::Schema.new(schema_hash, strict: true)
123
137
  ```
124
138
 
125
- If the schema is invalid, you can access the errors via the `messages` attribute
139
+ There are multiple methods to inspect a schema:
126
140
 
127
141
  ```ruby
128
142
  schema_hash = {
129
- "fields" => [
143
+ fields: [
130
144
  {
131
- "name"=>"id",
132
- "title"=>"Identifier",
133
- "type"=>"integer"
134
- },
135
- {
136
- "name"=>"title",
137
- "title"=>"Title",
138
- "type"=>"string"
139
- }
140
- ],
141
- "primaryKey"=>"identifier"
142
- }
143
-
144
- schema.valid?
145
- #=> false
146
- schema.messages
147
- #=> ["The JSON Table Schema primaryKey value `identifier` is not found in any of the schema's field names"]
148
- ```
149
-
150
- ## Schema Model
151
-
152
- You can also access the schema via a Ruby model, with some useful methods for interaction:
153
-
154
- ```ruby
155
- schema_hash = {
156
- "fields" => [
157
- {
158
- "name" => "id",
159
- "type" => "string",
160
- "constraints" => {
161
- "required" => true,
162
- }
145
+ name: 'id',
146
+ type: 'string',
147
+ constraints: {
148
+ required: true,
163
149
  },
164
- {
165
- "name" => "height",
166
- "type" => "number"
167
- }
150
+ },
151
+ {
152
+ name: 'height',
153
+ type: 'number',
154
+ },
155
+ {
156
+ name: 'state',
157
+ },
168
158
  ],
169
- "primaryKey" => "id",
170
- "foreignKeys" => [
159
+ primaryKey: 'id',
160
+ foreignKeys: [
171
161
  {
172
- "fields" => "state",
173
- "reference" => {
174
- "datapackage" => "http://data.okfn.org/data/mydatapackage/",
175
- "resource" => "the-resource",
176
- "fields" => "state_id"
177
- }
178
- }
162
+ fields: 'state',
163
+ reference: {
164
+ resource: 'the-resource',
165
+ fields: 'state_id',
166
+ },
167
+ },
179
168
  ]
180
169
  }
181
-
182
170
  schema = TableSchema::Schema.new(schema_hash)
183
171
 
184
172
  schema.headers
@@ -186,79 +174,126 @@ schema.headers
186
174
  schema.required_headers
187
175
  #=> ["id"]
188
176
  schema.fields
189
- #=> [{"name"=>"id", "constraints"=>{"required"=>true}, "type"=>"string", "format"=>"default"}, {"name"=>"height", "type"=>"number", "format"=>"default"}]
177
+ #=> [{:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}, {:name=>"height", :type=>"number", :format=>"default", :constraints=>{}}]
190
178
  schema.primary_keys
191
179
  #=> ["id"]
192
180
  schema.foreign_keys
193
- #=> [{"fields" => "state", "reference" => { "datapackage" => "http://data.okfn.org/data/mydatapackage/", "resource" => "the-resource", "fields" => "state_id" } } ]
181
+ # => [{:fields=>"state", :reference=>{:resource=>"the-resource", :fields=>"state_id"}}]
194
182
  schema.get_field('id')
195
- #=> {"name"=>"id", "constraints"=>{"required"=>true}, "type"=>"string", "format"=>"default"}
183
+ # => {:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}
196
184
  schema.has_field?('foo')
197
185
  #=> false
198
186
  schema.get_type('id')
199
187
  #=> 'string'
200
188
  schema.get_fields_by_type('string')
201
- #=> [{"name"=>"id", "constraints"=>{"required"=>true}, "type"=>"string", "format"=>"default"}, {"name"=>"height", "type"=>"string", "format"=>"default"}]
189
+ # => [{:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}, {:name=>"state", :type=>"string", :format=>"default", :constraints=>{}}]
202
190
  schema.get_constraints('id')
203
- #=> {"required" => true}
204
- schema.cast_row(['string', '10.0'])
205
- #=> ['string', 10.0]
206
- schema.cast([['foo', '12.0'],['bar', '10.0']])
207
- #=> [['foo', 12.0],['bar', 10.0]]
191
+ # => {:required=>true}
192
+ ```
193
+
194
+ #### Cast row
195
+
196
+ To check if a given set of values complies with the schema, you can use `cast_row`:
197
+
198
+ ```
199
+ schema.cast_row(['string', '10.0', 'State'])
200
+ #=> ['string', 10.0, 'State']
201
+ ```
202
+
203
+ By default the converter will fail on the first error it finds. However, by passing `fail_fast: false` as the second argument the errors will be collected into an `exception.errors` attribute for you to review later. For example:
204
+
205
+ ```ruby
206
+ row = [3, 'nan', 'State']
207
+
208
+ schema.cast_row(row)
209
+ #=> TableSchema::InvalidCast: 3 is not a string
210
+ begin
211
+ schema.cast_row(row, fail_fast: false)
212
+ rescue TableSchema::MultipleInvalid => exception
213
+ exception.errors
214
+ end
215
+ #=> #<Set: {#<TableSchema::InvalidCast: 3 is not a string>,
216
+ #<TableSchema::InvalidCast: nan is not a number>}>
208
217
  ```
209
218
 
210
- When casting a row (using `cast_row`), or a number of rows (using `cast`), by default the converter will fail on the first error it finds. If you pass `false` as the second argument, the errors will be collected into a `errors` attribute for you to review later. For example:
219
+ ### Validate a schema
220
+
221
+ To make sure a schema complies with [Table Schema spec](https://specs.frictionlessdata.io/table-schema), we validate each custom schema against the
222
+ official [Table Schema schema](https://specs.frictionlessdata.io/schemas/table-schema.json):
211
223
 
212
224
  ```ruby
213
225
  schema_hash = {
214
- "fields" => [
215
- {
216
- "name" => "id",
217
- "type" => "string",
218
- "constraints" => {
219
- "required" => true,
220
- }
221
- },
222
- {
223
- "name" => "height",
224
- "type" => "number"
225
- }
226
+ fields: [
227
+ { name: 'id' },
226
228
  ]
227
229
  }
228
-
229
230
  schema = TableSchema::Schema.new(schema_hash)
231
+ schema.validate
232
+ #=> true
233
+ ```
230
234
 
231
- rows = [
232
- ['foo', 'notanumber'],
233
- ['bar', 'notanumber'],
234
- ['wrong column count']
235
- ]
235
+ If the schema is invalid, you can access the errors via the `errors` attribute
236
236
 
237
- schema.cast(rows)
238
- #=> TableSchema::InvalidCast: notanumber is not a number
239
- schema.cast(rows, false)
240
- #=> TableSchema::MultipleInvalid
237
+ ```ruby
238
+ schema_hash = {
239
+ fields: [
240
+ {
241
+ name: 'id',
242
+ title: 'Identifier',
243
+ type: 'integer'
244
+ },
245
+ {
246
+ name: 'title',
247
+ title: 'Title',
248
+ type: 'string'
249
+ }
250
+ ],
251
+ primaryKey: 'identifier'
252
+ }
253
+
254
+ schema = TableSchema::Schema.new(schema_hash)
255
+ schema.validate
256
+ #=> false
241
257
  schema.errors
242
- #=> [#<TableSchema::InvalidCast: notanumber is not a number>, #<TableSchema::InvalidCast: notanumber is not a number>, #<TableSchema::ConversionError: The number of items to convert (1) does not match the number of headers in the schema (2)>]
258
+ #=> #<Set: {"The TableSchema primaryKey value `identifier` is not found in any of the schema's field names"}>
259
+
260
+ # Raise error if validation fails
261
+ schema.validate!
262
+ #=> TableSchema::SchemaException: The TableSchema primaryKey value `identifier` is not found in any of the schema's field names
243
263
  ```
244
264
 
245
265
  ## Field
246
266
 
267
+ Data values can be cast to native Ruby objects with a Field instance. This allows formats and constraints to be defined for the field in the [field descriptor](https://specs.frictionlessdata.io/table-schema/#field-descriptors):
268
+
247
269
  ```ruby
248
270
  # Init field
249
- field = TableSchema::Field.new({'type': 'number'})
271
+ field = TableSchema::Field.new({
272
+ name: 'over_1700',
273
+ type: 'number',
274
+ constraints: {
275
+ minimum: '1700',
276
+ },
277
+ })
250
278
 
251
279
  # Cast a value
252
280
  field.cast_value('12345')
253
281
  #=> 12345.0
254
282
  ```
255
283
 
256
- Data values can be cast to native Ruby objects with a Field instance. Type instances can be initialized with f[ield descriptors](http://dataprotocols.org/json-table-schema/#field-descriptors). This allows formats and constraints to be defined.
284
+ Casting a value will check the value is of the expected `type`, is in the correct `format`, and complies with any `constraints` imposed in the descriptor.
257
285
 
258
- Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed by a schema. E.g. a date value (in ISO 8601 format) can be cast with a DateType instance. Values that can't be cast will raise an `InvalidCast` exception.
286
+ Value that can't be cast will raise an `InvalidCast` exception.
259
287
 
260
288
  Casting a value that doesn't meet the constraints will raise a `ConstraintError` exception.
261
289
 
290
+ ```ruby
291
+ field.cast_value('nan')
292
+ #=> TableSchema::InvalidCast: nan is not a number
293
+ field.cast_value('1200')
294
+ #=> TableSchema::ConstraintError: The field `over_1700` must not be less than 1700
295
+ ```
296
+
262
297
  ## Development
263
298
 
264
299
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.