py-devo 0.2.0__tar.gz → 0.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. py_devo-0.2.2/PKG-INFO +778 -0
  2. py_devo-0.2.2/README.md +764 -0
  3. py_devo-0.2.2/py_devo.egg-info/PKG-INFO +778 -0
  4. {py_devo-0.2.0 → py_devo-0.2.2}/pyproject.toml +2 -2
  5. py_devo-0.2.0/PKG-INFO +0 -167
  6. py_devo-0.2.0/README.md +0 -153
  7. py_devo-0.2.0/py_devo.egg-info/PKG-INFO +0 -167
  8. {py_devo-0.2.0 → py_devo-0.2.2}/LICENSE +0 -0
  9. {py_devo-0.2.0 → py_devo-0.2.2}/devo/__init__.py +0 -0
  10. {py_devo-0.2.0 → py_devo-0.2.2}/devo/_infer.py +0 -0
  11. {py_devo-0.2.0 → py_devo-0.2.2}/devo/_parser.py +0 -0
  12. {py_devo-0.2.0 → py_devo-0.2.2}/devo/_report.py +0 -0
  13. {py_devo-0.2.0 → py_devo-0.2.2}/devo/_schema.py +0 -0
  14. {py_devo-0.2.0 → py_devo-0.2.2}/devo/cli.py +0 -0
  15. {py_devo-0.2.0 → py_devo-0.2.2}/devo/enrich.py +0 -0
  16. {py_devo-0.2.0 → py_devo-0.2.2}/devo/exceptions.py +0 -0
  17. {py_devo-0.2.0 → py_devo-0.2.2}/devo/validate.py +0 -0
  18. {py_devo-0.2.0 → py_devo-0.2.2}/devo/webui.py +0 -0
  19. {py_devo-0.2.0 → py_devo-0.2.2}/py_devo.egg-info/SOURCES.txt +0 -0
  20. {py_devo-0.2.0 → py_devo-0.2.2}/py_devo.egg-info/dependency_links.txt +0 -0
  21. {py_devo-0.2.0 → py_devo-0.2.2}/py_devo.egg-info/entry_points.txt +0 -0
  22. {py_devo-0.2.0 → py_devo-0.2.2}/py_devo.egg-info/requires.txt +0 -0
  23. {py_devo-0.2.0 → py_devo-0.2.2}/py_devo.egg-info/top_level.txt +0 -0
  24. {py_devo-0.2.0 → py_devo-0.2.2}/setup.cfg +0 -0
  25. {py_devo-0.2.0 → py_devo-0.2.2}/tests/test_cli.py +0 -0
  26. {py_devo-0.2.0 → py_devo-0.2.2}/tests/test_enrich.py +0 -0
  27. {py_devo-0.2.0 → py_devo-0.2.2}/tests/test_infer.py +0 -0
  28. {py_devo-0.2.0 → py_devo-0.2.2}/tests/test_parser.py +0 -0
  29. {py_devo-0.2.0 → py_devo-0.2.2}/tests/test_syntax_only.py +0 -0
  30. {py_devo-0.2.0 → py_devo-0.2.2}/tests/test_validate.py +0 -0
py_devo-0.2.2/PKG-INFO ADDED
@@ -0,0 +1,778 @@
1
+ Metadata-Version: 2.4
2
+ Name: py-devo
3
+ Version: 0.2.2
4
+ Summary: DEVO — CSV to iCSV enrichment and Frictionless validation
5
+ License-Expression: MIT
6
+ Project-URL: Source, https://github.com/chasenunez/devo
7
+ Requires-Python: >=3.9
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: frictionless>=4.0.0
11
+ Provides-Extra: webui
12
+ Requires-Dist: flask>=2.0.0; extra == "webui"
13
+ Dynamic: license-file
14
+
15
+ # DEVO
16
+
17
+ DEVO takes a plain CSV, infers column types and statistics, and produces three output files:
18
+
19
+ | Output file | What it is |
20
+ |---|---|
21
+ | `data.icsv` | Self-documenting [iCSV](https://envidat.github.io/iCSV/) with embedded metadata |
22
+ | `data_schema.json` | [Frictionless Table Schema](https://specs.frictionlessdata.io/table-schema/) for data validation |
23
+ | `data_DEVO_report.txt` | Human-readable validation report (**start here**) |
24
+
25
+ Before uploading, confirm:
26
+
27
+ - [ ] `Valid: YES` in the report
28
+ - [ ] All `# types` entries match the real-world meaning of each column
29
+ - [ ] `# min` and `# max` values are physically plausible
30
+ - [ ] `# missing_count` values match your expectations
31
+ - [ ] No `[WARN]` lines in TYPE CONSISTENCY
32
+ - [ ] `# description` fields are filled in (if required by your data archive)
33
+ - [ ] The `.icsv` file and its `_schema.json` are both included in your upload
34
+
35
+ ---
36
+
37
+ ## Contents
38
+
39
+ 1. [Installation](#1-installation)
40
+ 2. [The Three Commands](#2-the-three-commands)
41
+ 3. [Tutorial: From Messy CSV to Upload-Ready iCSV](#3-tutorial-from-messy-csv-to-upload-ready-icsv)
42
+ 4. [Understanding the Validation Report](#4-understanding-the-validation-report)
43
+ 5. [Understanding the iCSV Format](#5-understanding-the-icsv-format)
44
+ 6. [Common Errors and How to Fix Them](#6-common-errors-and-how-to-fix-them)
45
+ 7. [CLI Reference](#7-cli-reference)
46
+ 8. [Python API](#8-python-api)
47
+
48
+ ---
49
+
50
+ ## 1. Installation
51
+
52
+ ```bash
53
+ pip install py-devo
54
+ ```
55
+
56
+ Requires Python 3.9 or later. The `frictionless` package is installed automatically.
57
+
58
+ To install from a local clone:
59
+
60
+ ```bash
61
+ pip install -e .
62
+ ```
63
+
64
+ Verify the installation:
65
+
66
+ ```bash
67
+ devo --help
68
+ ```
69
+
70
+ ---
71
+
72
+ ## 2. The Three Commands
73
+
74
+ ```
75
+ devo run data.csv # enrich → validate → report (most common)
76
+ devo enrich data.csv # CSV → iCSV + schema only (no validation)
77
+ devo validate data.icsv # validate an iCSV against its schema
78
+ ```
79
+
80
+ All three write their outputs to `DEVO_output/` by default. Use `--out MY_DIR` to write elsewhere.
81
+
82
+ **Exit codes:** `0` = everything passed, `1` = validation found data errors, `2` = usage or file error.
83
+
84
+ ---
85
+
86
+ ## 3. Tutorial: From Messy CSV to Upload-Ready iCSV
87
+
88
+ This tutorial walks through a realistic scenario: environmental sensor data with two common problems. You will enrich the file, read the output to spot the problems, fix the source CSV, and confirm the corrected file is ready for upload.
89
+
90
+ ### Step 1: The raw data (with errors)
91
+
92
+ Save the following as `sensor_data.csv`:
93
+
94
+ ```csv
95
+ station_id,observation_date,temperature_c,humidity_pct,wind_speed_ms
96
+ S001,2024-01-15,21.4,65,3.2
97
+ S002,2024-01-15,MISSING,72,N/A
98
+ S003,2024-01-15,19.8,168,5.1
99
+ S004,2024-01-15,23.1,71,2.8
100
+ S005,2024-01-16,20.0,71,4.0
101
+ ```
102
+
103
+ Two problems are hidden in this file:
104
+
105
+ - **Row 2, `temperature_c`**: the value `MISSING` is a custom nodata sentinel that DEVO does not recognise by default. DEVO will treat it as a real string value, which forces the entire column's inferred type to `string` instead of `number`.
106
+ - **Row 3, `humidity_pct`**: the value `168` is a data-entry error; relative humidity cannot exceed 100%. DEVO will not catch impossible domain values on its own, but the iCSV will expose the inflated maximum so you can spot it.
107
+
108
+ (Note: `N/A` in `wind_speed_ms` is fine; it is a recognised nodata sentinel and is handled correctly.)
109
+
110
+ ---
111
+
112
+ ### Step 2: First run
113
+
114
+ ```bash
115
+ devo run sensor_data.csv
116
+ ```
117
+
118
+ Terminal output:
119
+
120
+ ```
121
+ [OK] Enriched: DEVO_output/sensor_data.icsv
122
+ [OK] Report: DEVO_output/sensor_data_DEVO_report.txt
123
+ ```
124
+
125
+ The command exits with code `0` (success) because DEVO describes the data as it finds it; the schema it builds from the data will technically fit the data. Errors only appear in the report when the data contradicts the schema. Reading the outputs is how you find hidden problems.
126
+
127
+ ---
128
+
129
+ ### Step 3: Read the validation report
130
+
131
+ Open `DEVO_output/sensor_data_DEVO_report.txt`:
132
+
133
+ ```
134
+ DEVO Validation Report
135
+ ======================
136
+ File: sensor_data.icsv
137
+ Date: 2024-01-20T10:35:22Z
138
+ Valid: YES
139
+
140
+ METADATA
141
+ ----------------------------------------
142
+ [OK] All required metadata present.
143
+
144
+ TYPE CONSISTENCY
145
+ ----------------------------------------
146
+ [OK] station_id: declared=string, inferred=string
147
+ [OK] observation_date: declared=datetime, inferred=datetime
148
+ [OK] temperature_c: declared=string, inferred=string
149
+ [OK] humidity_pct: declared=integer, inferred=integer
150
+ [OK] wind_speed_ms: declared=number, inferred=number
151
+
152
+ DATA VALIDATION
153
+ ----------------------------------------
154
+ [PASS] No data errors found.
155
+ ```
156
+
157
+ **The report says `Valid: YES`.** But look at `temperature_c`: it is declared and inferred as `string`. Temperature readings should be numbers. The report is technically correct (the declared type matches the inferred type), but the inferred type is wrong because DEVO did not know that `MISSING` should be treated as a nodata sentinel.
158
+
159
+ The report alone is not enough. You also need to read the iCSV.
160
+
161
+ ---
162
+
163
+ ### Step 4: Read the iCSV to spot the problems
164
+
165
+ Open `DEVO_output/sensor_data.icsv`. The `# [FIELDS]` section is the most important part to review:
166
+
167
+ ```
168
+ # [FIELDS]
169
+ # fields = station_id|observation_date|temperature_c|humidity_pct|wind_speed_ms
170
+ # types = string|datetime|string|integer|number
171
+ # min = |2024-01-15T00:00:00||65|2.8
172
+ # max = |2024-01-16T00:00:00||168|5.1
173
+ # missing_count = 0|0|0|0|1
174
+ # description = ||||
175
+ ```
176
+
177
+ Scan each column from left to right:
178
+
179
+ | Column | Type | Min | Max | Missing | Problem? |
180
+ |---|---|---|---|---|---|
181
+ | `station_id` | string | — | — | 0 | No |
182
+ | `observation_date` | datetime | 2024-01-15 | 2024-01-16 | 0 | No |
183
+ | `temperature_c` | **string** | — | — | **0** | **Yes: should be number; `MISSING` not recognised** |
184
+ | `humidity_pct` | integer | 65 | **168** | 0 | **Yes: max of 168 is physically impossible** |
185
+ | `wind_speed_ms` | number | 2.8 | 5.1 | 1 | No |
186
+
187
+ Two red flags:
188
+ 1. `temperature_c` type is `string` and `missing_count` is `0`; the column has a nodata value (`MISSING`) that was treated as a real string.
189
+ 2. `humidity_pct` max is `168`. Relative humidity cannot exceed 100; this is a data-entry error.
190
+
191
+ ---
192
+
193
+ ### Step 5: Fix the errors
194
+
195
+ #### Fix 1: The unrecognised nodata sentinel
196
+
197
+ The cleanest fix is to replace `MISSING` in the source CSV with a sentinel DEVO already recognises: `N/A`, `NA`, `null`, or an empty cell are all understood automatically.
198
+
199
+ Change row 2, column `temperature_c` from `MISSING` to `N/A` (or leave the cell blank).
200
+
201
+ If you cannot change the source data and `MISSING` will always appear in your files, pass `--nodata MISSING` on the command line. DEVO will then treat `MISSING` the same way it treats `N/A`:
202
+
203
+ ```bash
204
+ devo run sensor_data.csv --nodata MISSING
205
+ ```
206
+
207
+ #### Fix 2: The impossible humidity value
208
+
209
+ Row 3 has `humidity_pct = 168`. Investigate the source; it is likely a typo for `68`. Correct it in the CSV.
210
+
211
+ ---
212
+
213
+ ### Step 6: Re-run on the corrected file
214
+
215
+ After making both corrections, `sensor_data.csv` should look like this:
216
+
217
+ ```csv
218
+ station_id,observation_date,temperature_c,humidity_pct,wind_speed_ms
219
+ S001,2024-01-15,21.4,65,3.2
220
+ S002,2024-01-15,N/A,72,N/A
221
+ S003,2024-01-15,19.8,68,5.1
222
+ S004,2024-01-15,23.1,71,2.8
223
+ S005,2024-01-16,20.0,71,4.0
224
+ ```
225
+
226
+ Run DEVO again:
227
+
228
+ ```bash
229
+ devo run sensor_data.csv
230
+ ```
231
+
232
+ Terminal output:
233
+
234
+ ```
235
+ [OK] Enriched: DEVO_output/sensor_data.icsv
236
+ [OK] Report: DEVO_output/sensor_data_DEVO_report.txt
237
+ ```
238
+
239
+ Validation report:
240
+
241
+ ```
242
+ DEVO Validation Report
243
+ ======================
244
+ File: sensor_data.icsv
245
+ Date: 2024-01-20T10:40:15Z
246
+ Valid: YES
247
+
248
+ METADATA
249
+ ----------------------------------------
250
+ [OK] All required metadata present.
251
+
252
+ TYPE CONSISTENCY
253
+ ----------------------------------------
254
+ [OK] station_id: declared=string, inferred=string
255
+ [OK] observation_date: declared=datetime, inferred=datetime
256
+ [OK] temperature_c: declared=number, inferred=number
257
+ [OK] humidity_pct: declared=integer, inferred=integer
258
+ [OK] wind_speed_ms: declared=number, inferred=number
259
+
260
+ DATA VALIDATION
261
+ ----------------------------------------
262
+ [PASS] No data errors found.
263
+ ```
264
+
265
+ The `# [FIELDS]` section of the iCSV now shows correct types and a plausible maximum for humidity:
266
+
267
+ ```
268
+ # [FIELDS]
269
+ # fields = station_id|observation_date|temperature_c|humidity_pct|wind_speed_ms
270
+ # types = string|datetime|number|integer|number
271
+ # min = |2024-01-15T00:00:00|19.8|65|2.8
272
+ # max = |2024-01-16T00:00:00|23.1|72|5.1
273
+ # missing_count = 0|0|1|0|1
274
+ # description = ||||
275
+ ```
276
+
277
+ ---
278
+
279
+ ### Step 7: How to know the file is ready for upload
280
+
281
+ A file is ready for upload when all of the following are true:
282
+
283
+ - [ ] **Report says `Valid: YES`**
284
+ - [ ] **All column types in `# types` are correct** for the data: numbers are `integer` or `number`, dates are `datetime`, free text is `string`
285
+ - [ ] **`# min` and `# max` values are physically plausible**, with no impossible extremes like `humidity_pct = 168`
286
+ - [ ] **`# missing_count` matches your expectation.** If a column should have no gaps and shows `missing_count = 5`, investigate before uploading.
287
+ - [ ] **No `[WARN]` lines in TYPE CONSISTENCY.** A warning means the declared type does not match what DEVO sees in the data (see [Common Errors](#6-common-errors-and-how-to-fix-them)).
288
+
289
+ Once all boxes are checked, submit the `.icsv` file and its accompanying `_schema.json`.
290
+
291
+ ---
292
+
293
+ ## 4. Understanding the Validation Report
294
+
295
+ The report has three sections:
296
+
297
+ ### Report header
298
+
299
+ ```
300
+ DEVO Validation Report
301
+ ======================
302
+ File: sensor_data.icsv
303
+ Date: 2024-01-20T10:40:15Z
304
+ Valid: YES
305
+ ```
306
+
307
+ `Valid: YES` means **both** the metadata check and the Frictionless data check passed. Type consistency warnings (`[WARN]`) do not make the file invalid; they are advisory. `Valid: NO` means at least one `[FAIL]` was found in METADATA or DATA VALIDATION.
308
+
309
+ ---
310
+
311
+ ### METADATA section
312
+
313
+ Checks that the required iCSV metadata keys are present.
314
+
315
+ ```
316
+ METADATA
317
+ ----------------------------------------
318
+ [OK] All required metadata present.
319
+ ```
320
+
321
+ Or, if there are problems:
322
+
323
+ ```
324
+ METADATA
325
+ ----------------------------------------
326
+ [FAIL] Missing required metadata key: field_delimiter
327
+ [WARN] Spatial columns detected but 'geometry' metadata key is missing
328
+ [WARN] Spatial columns detected but 'srid' metadata key is missing
329
+ ```
330
+
331
+ | Message | Meaning | Effect on `Valid` |
332
+ | ------------------------------------------------------------------------ | ----------------------------------------------------------------------------- | ----------------- |
333
+ | `[OK] All required metadata present.` | Everything is in order | — |
334
+ | `[FAIL] Missing required metadata key: field_delimiter` | The `field_delimiter` key is absent from `# [METADATA]` | Sets `Valid: NO` |
335
+ | `[WARN] Spatial columns detected but 'geometry' metadata key is missing` | Columns named `lat`/`lon`/`geometry` found but `geometry` key is not declared | Advisory only |
336
+ | `[WARN] Spatial columns detected but 'srid' metadata key is missing` | Lat/lon columns found but no coordinate reference system declared | Advisory only |
337
+
338
+ `[FAIL]` in METADATA sets the overall result to `Valid: NO`. `[WARN]` in METADATA does not.
339
+
340
+ ---
341
+
342
+ ### TYPE CONSISTENCY section
343
+
344
+ DEVO re-infers each column's type from the actual data rows and compares it to the type declared in `# [FIELDS]`. This catches cases where the declared type was manually edited to be stricter than what the data actually contains.
345
+
346
+ ```
347
+ TYPE CONSISTENCY
348
+ ----------------------------------------
349
+ [OK] temperature_c: declared=number, inferred=number
350
+ [WARN] humidity_pct: declared=integer, inferred=number
351
+ Inferred type is wider than declared. Data may not satisfy 'integer' constraints.
352
+ ```
353
+
354
+ | Result | Meaning |
355
+ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
356
+ | `[OK]` | Inferred type is equal to or narrower than declared (e.g., inferred `integer` satisfies declared `number`) |
357
+ | `[WARN]` | Inferred type is **wider** than declared (e.g., inferred `number` does not satisfy declared `integer`; floats exist but integers are expected) |
358
+
359
+ Type hierarchy (narrowest to widest): `integer` → `number` → `string`, and `datetime` → `string`.
360
+
361
+ `[WARN]` in TYPE CONSISTENCY is advisory and does **not** set `Valid: NO`. However, it usually means the data has values that will fail Frictionless validation. Check the DATA VALIDATION section for accompanying `[FAIL]` lines.
362
+
363
+ ---
364
+
365
+ ### DATA VALIDATION section
366
+
367
+ Frictionless validates the actual data rows against the schema JSON. This catches type mismatches, out-of-range values, and required-field violations.
368
+
369
+ ```
370
+ DATA VALIDATION
371
+ ----------------------------------------
372
+ [PASS] No data errors found.
373
+ ```
374
+
375
+ Or, when errors are found:
376
+
377
+ ```
378
+ DATA VALIDATION
379
+ ----------------------------------------
380
+ [FAIL] 3 error(s) found:
381
+ Row 2, Col temperature_c [type-error]: type is "number/default" and value "MISSING" is not valid
382
+ Row 3, Col humidity_pct [constraint-error]: constraint "maximum is 100" is not satisfied for value "168"
383
+ Row 4, Col station_id [required-error]: constraint "required is True" is not satisfied for value ""
384
+ ```
385
+
386
+ Each error line shows:
387
+ - **Row number**: the row in the data section (row 1 is the header, so row 2 is the first data row)
388
+ - **Column name**: which field failed
389
+ - **Error code**: the Frictionless error type (see table below)
390
+ - **Message**: the specific constraint that was violated
391
+
392
+ | Error code | What it means | How to fix |
393
+ |---|---|---|
394
+ | `type-error` | A value cannot be parsed as the declared type | Correct the value in the source CSV, or adjust the type in the schema if the declaration is wrong |
395
+ | `constraint-error` | A value falls outside a `minimum`, `maximum`, or other constraint | Correct the value in the source CSV, or update the schema constraint if it was set too tightly |
396
+ | `required-error` | A required field has a blank or missing value | Fill in the missing value, or mark the field as not required in the schema |
397
+
398
+ If there are more than 50 errors, the report shows only the first 50 and notes the total count. Fix the listed errors first; re-running often reveals whether additional errors exist.
399
+
400
+ ---
401
+
402
+ ## 5. Understanding the iCSV Format
403
+
404
+ An iCSV file is a plain-text CSV with a structured comment header. Comments begin with `#`. There are three named sections.
405
+
406
+ ### `# [METADATA]` section
407
+
408
+ Key/value pairs describing the file as a whole.
409
+
410
+ ```
411
+ # iCSV 1.0 UTF-8
412
+ # [METADATA]
413
+ # iCSV_version = 1.0
414
+ # field_delimiter = |
415
+ # rows = 5
416
+ # columns = 5
417
+ # creation_date = 2024-01-20T10:40:15.123456Z
418
+ # nodata = N/A
419
+ # generator = DEVO
420
+ ```
421
+
422
+ **`field_delimiter`** is the character used to separate values in `# [FIELDS]` lines and in the `# [DATA]` section. DEVO maps commas to `|` (pipe) to avoid ambiguity with the `,` separator in metadata lines. This key is **required**; its absence is a `[FAIL]`.
423
+
424
+ **`nodata`** is the most commonly seen missing-value sentinel in the data. DEVO detects this automatically from the data; you can override it with `--nodata VALUE`.
425
+
426
+ **`geometry`** and **`srid`** are written automatically when DEVO detects spatial columns (columns named `lat`/`latitude`, `lon`/`lng`/`longitude`, or `geometry`).
427
+
428
+ ---
429
+
430
+ ### `# [FIELDS]` section
431
+
432
+ Per-column metadata. Each line is a pipe-delimited list aligned to the column order in `# [DATA]`.
433
+
434
+ ```
435
+ # [FIELDS]
436
+ # fields = station_id|observation_date|temperature_c|humidity_pct|wind_speed_ms
437
+ # types = string|datetime|number|integer|number
438
+ # min = |2024-01-15T00:00:00|19.8|65|2.8
439
+ # max = |2024-01-16T00:00:00|23.1|72|5.1
440
+ # missing_count = 0|0|1|0|1
441
+ # description = ||||
442
+ ```
443
+
444
+ | Field line | What to look for |
445
+ |---|---|
446
+ | `types` | Confirm every column has the type you expect. `string` for a column that should be numeric is a red flag. |
447
+ | `min` / `max` | Verify the range makes sense for your domain. A humidity maximum of 168 is physically impossible and indicates a data-entry error. String and all-missing columns have no min/max (blank). |
448
+ | `missing_count` | A `0` on a column that should have gaps means your nodata sentinel was not recognised. A high count on a column that should be complete is worth investigating. |
449
+ | `description` | Blank by default. Fill these in by hand before uploading if your archive requires field descriptions. |
450
+
451
+ The recognised **Frictionless types** are: `string`, `integer`, `number`, `datetime`. DEVO infers them in this order of preference: `integer` → `number` → `datetime` → `string`.
452
+
453
+ ---
454
+
455
+ ### `# [DATA]` section
456
+
457
+ The data rows, written with the `field_delimiter` as the separator. The first row after `# [DATA]` is the column header.
458
+
459
+ ```
460
+ # [DATA]
461
+ station_id|observation_date|temperature_c|humidity_pct|wind_speed_ms
462
+ S001|2024-01-15|21.4|65|3.2
463
+ S002|2024-01-15|N/A|72|N/A
464
+ ...
465
+ ```
466
+
467
+ You can edit values in `# [DATA]` directly, but if you do, re-run `devo validate` afterwards to confirm the edited file still passes.
468
+
469
+ ---
470
+
471
+ ## 6. Common Errors and How to Fix Them
472
+
473
+ ### A numeric column is typed as `string`
474
+
475
+ **Symptom:** `# types` shows `string` for a column that holds measurements or counts. `min` and `max` are blank for that column.
476
+
477
+ **Cause:** At least one value in the column is not a number and is not a recognised nodata sentinel. Common culprits: custom sentinels like `MISSING`, `ND`, `NM`, `-`, `na`, `none`; stray text like `error` or `N/M`; unit suffixes like `21.4°C`.
478
+
479
+ **Fix options:**
480
+
481
+ 1. Replace the non-numeric values with a standard sentinel (`N/A`, `NA`, `null`, or an empty cell) in the source CSV, then re-run.
482
+ 2. If you cannot change the source, tell DEVO about the custom sentinel:
483
+ ```bash
484
+ devo run data.csv --nodata MISSING
485
+ ```
486
+ 3. If the column genuinely has mixed text (e.g., a notes field), `string` may be correct; no action is needed.
487
+
488
+ ---
489
+
490
+ ### `[WARN]` in TYPE CONSISTENCY
491
+
492
+ **Symptom:**
493
+ ```
494
+ [WARN] temperature_c: declared=number, inferred=string
495
+ Inferred type is wider than declared. Data may not satisfy 'number' constraints.
496
+ ```
497
+
498
+ **Cause:** The type declared in `# [FIELDS]` (usually set during enrichment or edited manually) is stricter than what the actual data rows contain. The most common cause is editing the iCSV type from `string` to `number` without also fixing the values that caused the original `string` inference.
499
+
500
+ **Fix:** Look for non-numeric, non-sentinel values in that column's data rows. Either:
501
+ - Replace them with a recognised sentinel and re-run `devo run` on the corrected source CSV, or
502
+ - Revert the type in `# [FIELDS]` to `string` if the column really contains mixed content.
503
+
504
+ ---
505
+
506
+ ### `[FAIL] type-error` in DATA VALIDATION
507
+
508
+ **Symptom:**
509
+ ```
510
+ [FAIL] 1 error(s) found:
511
+ Row 2, Col temperature_c [type-error]: type is "number/default" and value "MISSING" is not valid
512
+ ```
513
+
514
+ **Cause:** A value in the data cannot be parsed as the declared type in the schema JSON. This often occurs together with a TYPE CONSISTENCY `[WARN]` and typically means the schema says one type (e.g., `number`) while the data contains incompatible values (e.g., the string `MISSING`).
515
+
516
+ **Fix:** Correct the value in the source data and re-run. If the value is a nodata sentinel, use `--nodata VALUE` so it is excluded from type inference and added to the schema's `missingValues` list.
517
+
518
+ ---
519
+
520
+ ### `[FAIL] constraint-error` in DATA VALIDATION
521
+
522
+ **Symptom:**
523
+ ```
524
+ [FAIL] 1 error(s) found:
525
+ Row 3, Col humidity_pct [constraint-error]: constraint "maximum is 72" is not satisfied for value "168"
526
+ ```
527
+
528
+ **Cause:** A value violates a `minimum` or `maximum` constraint in the schema. The schema constraints are derived from the data at enrichment time; if you later add or correct rows that push values outside the original range, validation will fail.
529
+
530
+ **Fix options:**
531
+
532
+ 1. Correct the outlier in the source CSV (e.g., change `168` to `68`) and re-run `devo run`.
533
+ 2. If the new range is legitimate, re-run `devo enrich` to rebuild the schema from the updated data, then `devo validate` to confirm.
534
+
535
+ ---
536
+
537
+ ### `[FAIL] required-error` in DATA VALIDATION
538
+
539
+ **Symptom:**
540
+ ```
541
+ [FAIL] 1 error(s) found:
542
+ Row 4, Col station_id [required-error]: constraint "required is True" is not satisfied for value ""
543
+ ```
544
+
545
+ **Cause:** A field was declared `required: true` in the schema (because it had no missing values at enrichment time), but a later row has an empty or missing value for that field.
546
+
547
+ **Fix options:**
548
+
549
+ 1. Fill in the missing value in the source CSV and re-run.
550
+ 2. If blanks are valid for that column, rebuild the schema after adding a row with a blank value; DEVO will set `required: false` and `missing_count` to a non-zero value.
551
+
552
+ ---
553
+
554
+ ### `[FAIL] Missing required metadata key: field_delimiter`
555
+
556
+ **Symptom:**
557
+ ```
558
+ METADATA
559
+ ----------------------------------------
560
+ [FAIL] Missing required metadata key: field_delimiter
561
+ ```
562
+ `Valid: NO`
563
+
564
+ **Cause:** The iCSV's `# [METADATA]` section is missing the `field_delimiter` key. This should not occur in iCSV files generated by DEVO, but can happen in hand-authored files.
565
+
566
+ **Fix:** Add `# field_delimiter = |` (or your actual delimiter) to the `# [METADATA]` section of the iCSV file.
567
+
568
+ ---
569
+
570
+ ### `[ERROR] Column name(s) contain the iCSV delimiter`
571
+
572
+ **Symptom (terminal):**
573
+ ```
574
+ [ERROR] Column name(s) contain the iCSV delimiter '|': ['flow|rate'].
575
+ Rename the columns or force a different delimiter with --delimiter.
576
+ ```
577
+
578
+ **Cause:** A column header in the source CSV contains the pipe character `|`. DEVO uses `|` as the iCSV field delimiter, so a pipe inside a column name is ambiguous.
579
+
580
+ **Fix options:**
581
+
582
+ 1. Rename the column in the source CSV (e.g., `flow|rate` → `flow_rate`).
583
+ 2. Force a different delimiter that does not appear in your column names:
584
+ ```bash
585
+ devo run data.csv --delimiter ":"
586
+ ```
587
+ Valid iCSV delimiters are: `, | / \ : ;`
588
+
589
+ ---
590
+
591
+ ### `[ERROR] No schema provided and none found`
592
+
593
+ **Symptom (terminal):**
594
+ ```
595
+ [ERROR] No schema provided and none found alongside data.icsv.
596
+ Run 'devo enrich' first or pass --schema.
597
+ ```
598
+
599
+ **Cause:** `devo validate` expects a schema JSON file in the same directory as the iCSV, named `<stem>_schema.json`. If the schema file is missing or in a different location, validation cannot run.
600
+
601
+ **Fix options:**
602
+
603
+ 1. Run `devo enrich data.csv` first to generate the schema, then `devo validate`.
604
+ 2. Point to an existing schema explicitly:
605
+ ```bash
606
+ devo validate data.icsv --schema /path/to/data_schema.json
607
+ ```
608
+
609
+ ---
610
+
611
+ ### `[ERROR] data.icsv is already an iCSV file`
612
+
613
+ **Symptom (terminal):**
614
+ ```
615
+ [ERROR] data.icsv is already an iCSV file.
616
+ Use 'devo validate' to validate it, or 'devo run' which handles both.
617
+ ```
618
+
619
+ **Cause:** You ran `devo enrich` on a `.icsv` file.
620
+
621
+ **Fix:** Use `devo validate data.icsv` to validate it, or `devo run data.icsv` (which detects the `.icsv` format and skips enrichment automatically).
622
+
623
+ ---
624
+
625
+ ### Nodata sentinels DEVO recognises automatically
626
+
627
+ The following values are treated as missing by default; no `--nodata` flag needed:
628
+
629
+ ```
630
+ (empty cell) NA N/A na n/a NULL null nan NaN -999 -999.0 -999.000000
631
+ ```
632
+
633
+ Any other sentinel, such as `MISSING`, `ND`, `NM`, `none`, `-`, or `9999`, must be declared with `--nodata VALUE`.
634
+
635
+ ---
636
+
637
+ ## 7. CLI Reference
638
+
639
+ ### `devo run`: enrich then validate (most common)
640
+
641
+ ```bash
642
+ devo run INPUT [--out DIR] [--delimiter CHAR] [--nodata VALUE] [--app PROFILE]
643
+ ```
644
+
645
+ If `INPUT` is a `.csv`, DEVO enriches it first, then validates. If `INPUT` is already a `.icsv`, enrichment is skipped.
646
+
647
+ | Flag | Default | Description |
648
+ |---|---|---|
649
+ | `--out DIR` | `DEVO_output` | Directory for all output files |
650
+ | `--delimiter CHAR` | auto-detected | Force a specific input delimiter (CSV files only) |
651
+ | `--nodata VALUE` | auto-detected | Declare a custom missing-value sentinel |
652
+ | `--app PROFILE` | (none) | Set the `application_profile` metadata key |
653
+
654
+ ---
655
+
656
+ ### `devo enrich`: CSV → iCSV + schema
657
+
658
+ ```bash
659
+ devo enrich INPUT.csv [--out DIR] [--delimiter CHAR] [--nodata VALUE] [--app PROFILE]
660
+ ```
661
+
662
+ Writes `INPUT.icsv` and `INPUT_schema.json` to `--out DIR`. Does not validate.
663
+
664
+ ---
665
+
666
+ ### `devo validate`: iCSV → validation report
667
+
668
+ ```bash
669
+ devo validate INPUT.icsv [--out DIR] [--schema PATH]
670
+ ```
671
+
672
+ | Flag | Default | Description |
673
+ |---|---|---|
674
+ | `--out DIR` | `DEVO_output` | Directory for the report |
675
+ | `--schema PATH` | auto-discover | Path to the schema JSON; defaults to `INPUT_schema.json` in the same directory |
676
+
677
+ ---
678
+
679
+ ### Exit codes
680
+
681
+ | Code | Meaning |
682
+ |---|---|
683
+ | `0` | Success: validation passed (or enrichment completed without errors) |
684
+ | `1` | Validation failed: data errors found; read the report |
685
+ | `2` | Usage or runtime error: bad arguments, missing file, etc. |
686
+
687
+ ---
688
+
689
+ ## 8. Python API
690
+
691
+ For scripted or batch use cases:
692
+
693
+ ```python
694
+ from devo.enrich import ICSVEnricher
695
+ from devo.validate import validate_icsv
696
+
697
+ # Step 1: Enrich CSV → iCSV + schema
698
+ icsv_path, schema_path = ICSVEnricher().make_icsv(
699
+ "sensor_data.csv",
700
+ outdir="DEVO_output",
701
+ nodata_override="MISSING", # optional: custom sentinel
702
+ application_profile="MyApp" # optional: profile name
703
+ )
704
+
705
+ # Step 2: Validate
706
+ report_path, is_valid = validate_icsv(
707
+ icsv_path,
708
+ schema_path=schema_path,
709
+ outdir="DEVO_output"
710
+ )
711
+
712
+ print(f"Valid: {is_valid}")
713
+ print(f"Report: {report_path}")
714
+
715
+ if not is_valid:
716
+ # Read the report for details
717
+ print(open(report_path).read())
718
+ ```
719
+
720
+ ### Error handling
721
+
722
+ ```python
723
+ from devo.exceptions import DEVOError, EnrichError, ParseError, ValidationError
724
+
725
+ try:
726
+ icsv_path, schema_path = ICSVEnricher().make_icsv("data.csv", "out")
727
+ report_path, is_valid = validate_icsv(icsv_path, schema_path=schema_path)
728
+ except EnrichError as e:
729
+ print(f"Enrichment failed: {e}")
730
+ except ParseError as e:
731
+ print(f"Could not parse iCSV header: {e}")
732
+ except ValidationError as e:
733
+ print(f"Validation infrastructure error: {e}")
734
+ except FileNotFoundError as e:
735
+ print(f"File not found: {e}")
736
+ ```
737
+
738
+ | Exception | When it is raised |
739
+ |---|---|
740
+ | `EnrichError` | Input CSV cannot be read, is already an iCSV, or has column names that contain the output delimiter |
741
+ | `ParseError` | An iCSV file is missing its `# [METADATA]` section or cannot be opened |
742
+ | `ValidationError` | The `frictionless` package is not installed |
743
+ | `FileNotFoundError` | The input file or schema file does not exist |
744
+
745
+ All four inherit from `DEVOError`, so `except DEVOError` catches any DEVO-specific failure.
746
+
747
+ ---
748
+
749
+ ### Batch processing example
750
+
751
+ ```python
752
+ from pathlib import Path
753
+ from devo.enrich import ICSVEnricher
754
+ from devo.validate import validate_icsv
755
+ from devo.exceptions import DEVOError
756
+
757
+ enricher = ICSVEnricher()
758
+ results = []
759
+
760
+ for csv_file in Path("incoming").glob("*.csv"):
761
+ try:
762
+ icsv, schema = enricher.make_icsv(str(csv_file), outdir="DEVO_output")
763
+ report, valid = validate_icsv(icsv, schema_path=schema)
764
+ results.append((csv_file.name, valid, report))
765
+ except DEVOError as e:
766
+ results.append((csv_file.name, False, str(e)))
767
+
768
+ for name, valid, info in results:
769
+ status = "READY" if valid else "NEEDS REVIEW"
770
+ print(f"{status} {name} → {info}")
771
+ ```
772
+
773
+ ---
774
+
775
+
776
+ ## License
777
+
778
+ MIT. See `LICENSE`.