csv-detective 0.8.1.dev1416__py3-none-any.whl → 0.8.1.dev1440__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. {csv_detective-0.8.1.dev1416.data → csv_detective-0.8.1.dev1440.data}/data/share/csv_detective/CHANGELOG.md +2 -1
  2. csv_detective-0.8.1.dev1440.data/data/share/csv_detective/LICENSE +21 -0
  3. {csv_detective-0.8.1.dev1416.data → csv_detective-0.8.1.dev1440.data}/data/share/csv_detective/README.md +6 -39
  4. csv_detective-0.8.1.dev1440.dist-info/METADATA +267 -0
  5. {csv_detective-0.8.1.dev1416.dist-info → csv_detective-0.8.1.dev1440.dist-info}/RECORD +9 -29
  6. csv_detective-0.8.1.dev1440.dist-info/licenses/LICENSE +21 -0
  7. csv_detective/detect_fields/FR/other/code_csp_insee/code_csp_insee.txt +0 -498
  8. csv_detective/detect_fields/FR/other/csp_insee/csp_insee.txt +0 -571
  9. csv_detective/detect_fields/FR/other/insee_ape700/insee_ape700.txt +0 -733
  10. csv_detective/detect_fields/geo/iso_country_code_alpha2/iso_country_code_alpha2.txt +0 -495
  11. csv_detective/detect_fields/geo/iso_country_code_alpha3/iso_country_code_alpha3.txt +0 -251
  12. csv_detective/detect_fields/geo/iso_country_code_numeric/iso_country_code_numeric.txt +0 -251
  13. csv_detective/detection/columns.py +0 -89
  14. csv_detective/detection/encoding.py +0 -27
  15. csv_detective/detection/engine.py +0 -46
  16. csv_detective/detection/formats.py +0 -145
  17. csv_detective/detection/headers.py +0 -32
  18. csv_detective/detection/rows.py +0 -18
  19. csv_detective/detection/separator.py +0 -44
  20. csv_detective/detection/variables.py +0 -98
  21. csv_detective/parsing/columns.py +0 -139
  22. csv_detective/parsing/compression.py +0 -11
  23. csv_detective/parsing/csv.py +0 -55
  24. csv_detective/parsing/excel.py +0 -169
  25. csv_detective/parsing/load.py +0 -97
  26. csv_detective/parsing/text.py +0 -61
  27. csv_detective-0.8.1.dev1416.data/data/share/csv_detective/LICENSE.AGPL.txt +0 -661
  28. csv_detective-0.8.1.dev1416.dist-info/METADATA +0 -42
  29. csv_detective-0.8.1.dev1416.dist-info/licenses/LICENSE.AGPL.txt +0 -661
  30. {csv_detective-0.8.1.dev1416.dist-info → csv_detective-0.8.1.dev1440.dist-info}/WHEEL +0 -0
  31. {csv_detective-0.8.1.dev1416.dist-info → csv_detective-0.8.1.dev1440.dist-info}/entry_points.txt +0 -0
  32. {csv_detective-0.8.1.dev1416.dist-info → csv_detective-0.8.1.dev1440.dist-info}/top_level.txt +0 -0
@@ -3,7 +3,8 @@
3
3
  ## Current (in progress)
4
4
 
5
5
  - Refactor label testing [#119](https://github.com/datagouv/csv-detective/pull/119)
6
- - Better URL detection [#120](https://github.com/datagouv/csv-detective/pull/120)
6
+ - Refactor repo metadata and requirements [#120](https://github.com/datagouv/csv-detective/pull/120) [#122](https://github.com/datagouv/csv-detective/pull/122)
7
+ - Better URL detection [#121](https://github.com/datagouv/csv-detective/pull/121)
7
8
 
8
9
  ## 0.8.0 (2025-05-20)
9
10
 
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 data.gouv.fr
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -10,7 +10,7 @@ You can also directly feed the URL of a remote file (from data.gouv.fr for insta
10
10
 
11
11
  ### Install the package
12
12
 
13
- You need to have python >= 3.7 installed. We recommend using a virtual environement.
13
+ You need to have python >= 3.9 installed. We recommend using a virtual environement.
14
14
 
15
15
  ```
16
16
  pip install csv-detective
@@ -182,7 +182,7 @@ An early version of this analysis of all resources on data.gouv.fr can be found
182
182
 
183
183
  ## Release
184
184
 
185
- The release process uses `bumpr`.
185
+ The release process uses `bumpx`.
186
186
 
187
187
  ```shell
188
188
  pip install -r requirements-build.txt
@@ -190,16 +190,16 @@ pip install -r requirements-build.txt
190
190
 
191
191
  ### Process
192
192
 
193
- 1. `bumpr` will handle bumping the version according to your command (patch, minor, major)
193
+ 1. `bumpx` will handle bumping the version according to your command (patch, minor, major)
194
194
  2. It will update the CHANGELOG according to the new version being published
195
195
  3. It will push a tag with the given version to github
196
196
  4. CircleCI will pickup this tag, build the package and publish it to pypi
197
- 5. `bumpr` will have everything ready for the next version (version, changelog...)
197
+ 5. `bumpx` will have everything ready for the next version (version, changelog...)
198
198
 
199
199
  ### Dry run
200
200
 
201
201
  ```shell
202
- bumpr -d -v
202
+ bumpx -d -v
203
203
  ```
204
204
 
205
205
  ### Release
@@ -207,38 +207,5 @@ bumpr -d -v
207
207
  This will release a patch version:
208
208
 
209
209
  ```shell
210
- bumpr -v
211
- ```
212
-
213
- See bumpr options for minor and major:
214
-
215
- ```
216
- $ bumpr -h
217
- usage: bumpr [-h] [--version] [-v] [-c CONFIG] [-d] [-st] [-b | -pr] [-M] [-m] [-p]
218
- [-s SUFFIX] [-u] [-pM] [-pm] [-pp] [-ps PREPARE_SUFFIX] [-pu]
219
- [--vcs {git,hg}] [-nc] [-P] [-nP]
220
- [file] [files ...]
221
-
222
- [...]
223
-
224
- optional arguments:
225
- -h, --help show this help message and exit
226
- --version show program's version number and exit
227
- -v, --verbose Verbose output
228
- -c CONFIG, --config CONFIG
229
- Specify a configuration file
230
- -d, --dryrun Do not write anything and display a diff
231
- -st, --skip-tests Skip tests
232
- -b, --bump Only perform the bump
233
- -pr, --prepare Only perform the prepare
234
-
235
- bump:
236
- -M, --major Bump major version
237
- -m, --minor Bump minor version
238
- -p, --patch Bump patch version
239
- -s SUFFIX, --suffix SUFFIX
240
- Set suffix
241
- -u, --unsuffix Unset suffix
242
-
243
- [...]
210
+ bumpx -v
244
211
  ```
@@ -0,0 +1,267 @@
1
+ Metadata-Version: 2.4
2
+ Name: csv_detective
3
+ Version: 0.8.1.dev1440
4
+ Summary: Detect tabular files column content
5
+ Home-page: https://github.com/datagouv/csv_detective
6
+ Author: Etalab
7
+ Author-email: opendatateam@data.gouv.fr
8
+ License: https://spdx.org/licenses/MIT.html#licenseText
9
+ Project-URL: Source, https://github.com/datagouv/csv_detective
10
+ Keywords: CSV data processing encoding guess parser tabular
11
+ Classifier: Development Status :: 2 - Pre-Alpha
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Programming Language :: Python :: Implementation :: CPython
21
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
22
+ Requires-Python: >=3.9
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: boto3<2,>=1.34.0
26
+ Requires-Dist: dateparser<2,>=1.2.0
27
+ Requires-Dist: faust-cchardet==2.1.19
28
+ Requires-Dist: pandas<3,>=2.2.0
29
+ Requires-Dist: python-dateutil<3,>=2.8.2
30
+ Requires-Dist: Unidecode<2,>=1.3.6
31
+ Requires-Dist: openpyxl==3.1.5
32
+ Requires-Dist: xlrd==2.0.1
33
+ Requires-Dist: odfpy==1.4.1
34
+ Requires-Dist: requests<3,>=2.32.3
35
+ Requires-Dist: python-magic==0.4.27
36
+ Requires-Dist: frformat==0.4.0
37
+ Requires-Dist: faker>=33.0.0
38
+ Requires-Dist: rstr==3.2.2
39
+ Provides-Extra: dev
40
+ Requires-Dist: pytest==8.3.0; extra == "dev"
41
+ Requires-Dist: responses==0.25.0; extra == "dev"
42
+ Dynamic: author
43
+ Dynamic: author-email
44
+ Dynamic: classifier
45
+ Dynamic: description
46
+ Dynamic: description-content-type
47
+ Dynamic: home-page
48
+ Dynamic: keywords
49
+ Dynamic: license
50
+ Dynamic: license-file
51
+ Dynamic: project-url
52
+ Dynamic: provides-extra
53
+ Dynamic: requires-dist
54
+ Dynamic: requires-python
55
+ Dynamic: summary
56
+
57
+ # CSV Detective
58
+
59
+ This is a package to **automatically detect column content in tabular files**. The script reads either the whole file or the first few rows and performs various checks to see for each column if it matches with various content types. This is currently done through regex and string comparison.
60
+
61
+ Currently supported file types: csv, xls, xlsx, ods.
62
+
63
+ You can also directly feed the URL of a remote file (from data.gouv.fr for instance).
64
+
65
+ ## How To ?
66
+
67
+ ### Install the package
68
+
69
+ You need to have python >= 3.9 installed. We recommend using a virtual environement.
70
+
71
+ ```
72
+ pip install csv-detective
73
+ ```
74
+
75
+ ### Detect some columns
76
+
77
+ Say you have a tabular file located at `file_path`. This is how you could use `csv_detective`:
78
+
79
+ ```
80
+ # Import the csv_detective package
81
+ from csv_detective import routine
82
+ import os # for this example only
83
+
84
+ # Replace by your file path
85
+ file_path = os.path.join('.', 'tests', 'code_postaux_v201410.csv')
86
+
87
+ # Open your file and run csv_detective
88
+ inspection_results = routine(
89
+ file_path, # or file URL
90
+ num_rows=-1, # Value -1 will analyze all lines of your file, you can change with the number of lines you wish to analyze
91
+ save_results=False, # Default False. If True, it will save result output into the same directory as the analyzed file, using the same name as your file and .json extension
92
+ output_profile=True, # Default False. If True, returned dict will contain a property "profile" indicating profile (min, max, mean, tops...) of every column of you csv
93
+ output_schema=True, # Default False. If True, returned dict will contain a property "schema" containing basic [tableschema](https://specs.frictionlessdata.io/table-schema/) of your file. This can be use to validate structure of other csv which should match same structure.
94
+ )
95
+ ```
96
+
97
+ ## So What Do You Get ?
98
+
99
+ ### Output
100
+
101
+ The program creates a `Python` dictionnary with the following information :
102
+
103
+ ```
104
+ {
105
+ "encoding": "windows-1252", # Encoding detected
106
+ "separator": ";", # Detected CSV separator
107
+ "header_row_idx": 0 # Index of the header (aka how many lines to skip to get it)
108
+ "headers": ['code commune INSEE', 'nom de la commune', 'code postal', "libellé d'acheminement"], # Header row
109
+ "total_lines": 42, # Number of rows (excluding header)
110
+ "nb_duplicates": 0, # Number of exact duplicates in rows
111
+ "heading_columns": 0, # Number of heading columns
112
+ "trailing_columns": 0, # Number of trailing columns
113
+ "categorical": ['Code commune'] # Columns that contain less than 25 different values (arbitrary threshold)
114
+ "columns": { # Property that conciliate detection from labels and content of a column
115
+ "Code commune": {
116
+ "python_type": "string",
117
+ "format": "code_commune_insee",
118
+ "score": 1.0
119
+ },
120
+ },
121
+ "columns_labels": { # Property that return detection from header columns
122
+ "Code commune": {
123
+ "python_type": "string",
124
+ "format": "code_commune_insee",
125
+ "score": 0.5
126
+ },
127
+ },
128
+ "columns_fields": { # Property that return detection from content columns
129
+ "Code commune": {
130
+ "python_type": "string",
131
+ "format": "code_commune_insee",
132
+ "score": 1.25
133
+ },
134
+ },
135
+ "profile": {
136
+ "column_name" : {
137
+ "min": 1, # only int and float
138
+ "max: 12, # only int and float
139
+ "mean": 5, # only int and float
140
+ "std": 5, # only int and float
141
+ "tops": [ # 10 most frequent values in the column
142
+ "xxx",
143
+ "yyy",
144
+ "..."
145
+ ],
146
+ "nb_distinct": 67, # number of distinct values
147
+ "nb_missing_values": 102 # number of empty cells in the column
148
+ }
149
+ },
150
+ "schema": { # TableSchema of the file if `output_schema` was set to `True`
151
+ "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
152
+ "name": "",
153
+ "title": "",
154
+ "description": "",
155
+ "countryCode": "FR",
156
+ "homepage": "",
157
+ "path": "https://github.com/datagouv/csv-detective",
158
+ "resources": [],
159
+ "sources": [
160
+ {"title": "Spécification Tableschema", "path": "https://specs.frictionlessdata.io/table-schema"},
161
+ {"title": "schema.data.gouv.fr", "path": "https://schema.data.gouv.fr"}
162
+ ],
163
+ "created": "2023-02-10",
164
+ "lastModified": "2023-02-10",
165
+ "version": "0.0.1",
166
+ "contributors": [
167
+ {"title": "Table schema bot", "email": "schema@data.gouv.fr", "organisation": "data.gouv.fr", "role": "author"}
168
+ ],
169
+ "fields": [
170
+ {
171
+ "name": "Code commune",
172
+ "description": "Le code INSEE de la commune",
173
+ "example": "23150",
174
+ "type": "string",
175
+ "formatFR": "code_commune_insee",
176
+ "constraints": {
177
+ "required": False,
178
+ "pattern": "^([013-9]\\d|2[AB1-9])\\d{3}$",
179
+ }
180
+ }
181
+ ]
182
+ }
183
+ }
184
+ ```
185
+
186
+ The output slightly differs depending on the file format:
187
+ - csv files have `encoding` and `separator`
188
+ - xls, xls, ods files have `engine` and `sheet_name`
189
+
190
+ ### What Formats Can Be Detected
191
+
192
+ Includes :
193
+
194
+ - Communes, Départements, Régions, Pays
195
+ - Codes Communes, Codes Postaux, Codes Departement, ISO Pays
196
+ - Codes CSP, Description CSP, SIREN
197
+ - E-Mails, URLs, Téléphones FR
198
+ - Years, Dates, Jours de la Semaine FR
199
+ - UUIDs, Mongo ObjectIds
200
+
201
+ ### Format detection and scoring
202
+ For each column, 3 scores are computed for each format, the higher the score, the more likely the format:
203
+ - the field score based on the values contained in the column (0.0 to 1.0).
204
+ - the label score based on the header of the column (0.0 to 1.0).
205
+ - the overall score, computed as `field_score * (1 + label_score/2)` (0.0 to 1.5).
206
+
207
+ The overall score computation aims to give more weight to the column contents while
208
+ still leveraging the column header.
209
+
210
+ #### `limited_output` - Select the output mode you want for json report
211
+
212
+ This option allows you to select the output mode you want to pass. To do so, you have to pass a `limited_output` argument to the `routine` function. This variable has two possible values:
213
+
214
+ - `limited_output` defaults to `True` which means report will contain only detected column formats based on a pre-selected threshold proportion in data. Report result is the standard output (an example can be found above in 'Output' section).
215
+ Only the format with highest score is present in the output.
216
+ - `limited_output=False` means report will contain a full list of all column format possibilities for each input data columns with a value associated which match to the proportion of found column type in data. With this report, user can adjust its rules of detection based on a specific threshold and has a better vision of quality detection for each columns. Results could also be easily transformed into a dataframe (columns types in column / column names in rows) for analysis and test.
217
+
218
+ ## Improvement suggestions
219
+
220
+ - Smarter refactors
221
+ - Improve performances
222
+ - Test other ways to load and process data (`pandas` alternatives)
223
+ - Add more and more detection modules...
224
+
225
+ Related ideas:
226
+
227
+ - store column names to make a learning model based on column names for (possible pre-screen)
228
+ - normalising data based on column prediction
229
+ - entity resolution (good luck...)
230
+
231
+ ## Why Could This Be of Any Use ?
232
+
233
+ Organisations such as [data.gouv.fr](http://data.gouv.fr) aggregate huge amounts of un-normalised data. Performing cross-examination across datasets can be difficult. This tool could help enrich the datasets metadata and facilitate linking them together.
234
+
235
+ [`udata-hydra`](https://github.com/etalab/udata-hydra) is a crawler that checks, analyzes (using `csv-detective`) and APIfies all tabular files from [data.gouv.fr](http://data.gouv.fr).
236
+
237
+ An early version of this analysis of all resources on data.gouv.fr can be found [here](https://github.com/Leobouloc/data.gouv-exploration).
238
+
239
+ ## Release
240
+
241
+ The release process uses `bumpx`.
242
+
243
+ ```shell
244
+ pip install -r requirements-build.txt
245
+ ```
246
+
247
+ ### Process
248
+
249
+ 1. `bumpx` will handle bumping the version according to your command (patch, minor, major)
250
+ 2. It will update the CHANGELOG according to the new version being published
251
+ 3. It will push a tag with the given version to github
252
+ 4. CircleCI will pickup this tag, build the package and publish it to pypi
253
+ 5. `bumpx` will have everything ready for the next version (version, changelog...)
254
+
255
+ ### Dry run
256
+
257
+ ```shell
258
+ bumpx -d -v
259
+ ```
260
+
261
+ ### Release
262
+
263
+ This will release a patch version:
264
+
265
+ ```shell
266
+ bumpx -v
267
+ ```
@@ -25,15 +25,12 @@ csv_detective/detect_fields/FR/geo/pays/__init__.py,sha256=2q5T4SmCK6ZFF1mrv7d-q
25
25
  csv_detective/detect_fields/FR/geo/region/__init__.py,sha256=JbFKDd4jAnd9yb7YqP36MoLdO1JFPm1cg60fGXt6ZvI,1074
26
26
  csv_detective/detect_fields/FR/other/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
27
27
  csv_detective/detect_fields/FR/other/code_csp_insee/__init__.py,sha256=SRWJvg3Ikyjmop9iL14igTjxNGpO-QB3fpADI_bLYEY,566
28
- csv_detective/detect_fields/FR/other/code_csp_insee/code_csp_insee.txt,sha256=rbcjtMP6qTZ7BTU6ZegkiXKCruqY_m9Ep6ZgRabFS_E,2486
29
28
  csv_detective/detect_fields/FR/other/code_import/__init__.py,sha256=zJ9YfPa5p--uHNQFeO1gTjxDy2Um_r-MxQd29VBNjFw,243
30
29
  csv_detective/detect_fields/FR/other/code_rna/__init__.py,sha256=Z0RjMBt1--ZL7Jd1RsHAQCCbTAQk_BnlnTq8VF1o_VA,146
31
30
  csv_detective/detect_fields/FR/other/code_waldec/__init__.py,sha256=41SYNzCzUFh4trQlwG-9UC0-1Wi4fTcv8Byi_dd9Lq4,168
32
31
  csv_detective/detect_fields/FR/other/csp_insee/__init__.py,sha256=lvcaVKgOPrCaZb-Y1-wYCbLYB_CQjCJFNAzfWDwtTVE,496
33
- csv_detective/detect_fields/FR/other/csp_insee/csp_insee.txt,sha256=kgKaKc-5PHu5U4--ugLjpFyMNtTU9CGdZ9ANU3YAsM4,32879
34
32
  csv_detective/detect_fields/FR/other/date_fr/__init__.py,sha256=kMV52djlG0y4o0ELEZuvTv_FvooYOgTnV1aWhycFJDc,284
35
33
  csv_detective/detect_fields/FR/other/insee_ape700/__init__.py,sha256=g8pOqJPKVpQiMd78zgrjXJWYeWkYhu8r3D4IQX519HQ,519
36
- csv_detective/detect_fields/FR/other/insee_ape700/insee_ape700.txt,sha256=nKgslakENwgE7sPkVNHqR23iXuxF02p9-v5MC2_ntx8,4398
37
34
  csv_detective/detect_fields/FR/other/sexe/__init__.py,sha256=iYkLe3MM51GWyBX_4BTq5PWDX_EeYRbEHWKMr8oE1MQ,269
38
35
  csv_detective/detect_fields/FR/other/siren/__init__.py,sha256=ohSwUL2rXqTXPG5WDAh2SP-lp1SzFCYgo4IhJ-PXmdk,442
39
36
  csv_detective/detect_fields/FR/other/siret/__init__.py,sha256=ThEeT6rXmS0EvHW8y4A_74bILyErDGxLe9v3elHOFs8,707
@@ -44,11 +41,8 @@ csv_detective/detect_fields/FR/temp/jour_de_la_semaine/__init__.py,sha256=TRJxFS
44
41
  csv_detective/detect_fields/FR/temp/mois_de_annee/__init__.py,sha256=GuOnGw39Kz82bXId8mNzmlC4YkOrrf_F7f4g4uW_uvY,581
45
42
  csv_detective/detect_fields/geo/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
46
43
  csv_detective/detect_fields/geo/iso_country_code_alpha2/__init__.py,sha256=gbuzf_9yytZnmYYABk7vK3WinSU_AnrSxRpNQ7xroa8,433
47
- csv_detective/detect_fields/geo/iso_country_code_alpha2/iso_country_code_alpha2.txt,sha256=YyPlDqCdz65ecf4Wes_r0P4rDSJG35niXtjc4MmctXM,1740
48
44
  csv_detective/detect_fields/geo/iso_country_code_alpha3/__init__.py,sha256=u98rn_wuAGlGh2wN-t5syLBlCkqcxCAjpbvbBN8tov8,409
49
- csv_detective/detect_fields/geo/iso_country_code_alpha3/iso_country_code_alpha3.txt,sha256=aYqKSohgXuBtcIBfF52f8JWYDdxL_HV_Ol1srGnWBp4,1003
50
45
  csv_detective/detect_fields/geo/iso_country_code_numeric/__init__.py,sha256=wJAynAkGZN7jKeI3xOeLXQ_irxQBb_J56pRkLDYVClY,436
51
- csv_detective/detect_fields/geo/iso_country_code_numeric/iso_country_code_numeric.txt,sha256=2GtEhuporsHYV-pU4q9kfXU5iOtfW5C0GYBTTKQtnnA,1004
52
46
  csv_detective/detect_fields/geo/json_geojson/__init__.py,sha256=FPHOfTrfXJs62-NgeOcNGOvwPd7I1fEVp8lTdMNfj3w,433
53
47
  csv_detective/detect_fields/geo/latitude_wgs/__init__.py,sha256=ArS6PuYEd0atZwSqNDZhXZz1TwzdiwdV8ovRYTOacpg,327
54
48
  csv_detective/detect_fields/geo/latlon_wgs/__init__.py,sha256=7_mnO9uC_kI7e2WR8xIer7Kqw8zi-v-JKaAD4zcoGbE,342
@@ -127,30 +121,16 @@ csv_detective/detect_labels/temp/date/__init__.py,sha256=w0eeZIseAmPwL4OvCWzZXbx
127
121
  csv_detective/detect_labels/temp/datetime_iso/__init__.py,sha256=d0laZNzHx-kSARs9Re8TZ11GNs99aMz6gXc72CJ6ul4,440
128
122
  csv_detective/detect_labels/temp/datetime_rfc822/__init__.py,sha256=53ysj7QgsxXwG1le3zfSJd1oaTTf-Er3jBeYi_A4F9g,458
129
123
  csv_detective/detect_labels/temp/year/__init__.py,sha256=7uWaCZY7dOG7nolW46IgBWmcu8K-9jPED-pOlMlErfo,433
130
- csv_detective/detection/columns.py,sha256=vfE-DKESA6J9Rfsl-a8tjgZfE21VmzArO5TrbzL0KmE,2905
131
- csv_detective/detection/encoding.py,sha256=tpjJEMNM_2TcLXDzn1lNQPnSRnsWYjs83tQ8jNwTj4E,973
132
- csv_detective/detection/engine.py,sha256=HiIrU-l9EO5Fbc2Vh8W_Uy5-dpKcQQzlxCqMuWc09LY,1530
133
- csv_detective/detection/formats.py,sha256=VwFazRAFJN6eaYUK7IauVU88vuUBHccESY4UD8EgGUo,5386
134
- csv_detective/detection/headers.py,sha256=wrVII2RQpsVmHhrO1DHf3dmiu8kbtOjBlskf41cnQmc,1172
135
- csv_detective/detection/rows.py,sha256=3qvsbsBcMxiqqfSYYkOgsRpX777rk22tnRHDwUA97kU,742
136
- csv_detective/detection/separator.py,sha256=XjeDBqhiBxVfkCPJKem9BAgJqs_hOgQltc_pxrH_-Tg,1547
137
- csv_detective/detection/variables.py,sha256=3qEMtjZ_zyIFXvTnFgK7ZMDx8C12uQXKfFjEj2moyJc,3558
138
124
  csv_detective/output/__init__.py,sha256=5KTevPfp_4MRxByJyOntQjToNfeG7dPQn-_13wSq7EU,1910
139
125
  csv_detective/output/dataframe.py,sha256=89iQRE59cHQyQQEsujQVIKP2YAUYpPklWkdDOqZE-wE,2183
140
126
  csv_detective/output/example.py,sha256=EdPX1iqHhIG4DsiHuYdy-J7JxOkjgUh_o2D5nrfM5fA,8649
141
127
  csv_detective/output/profile.py,sha256=B8YU541T_YPDezJGh4dkHckOShiwHSrZd9GS8jbmz7A,2919
142
128
  csv_detective/output/schema.py,sha256=ZDBWDOD8IYp7rcB0_n8l9JXGIhOQ6bTZHFWfTmnNNEQ,13480
143
129
  csv_detective/output/utils.py,sha256=HbmvCCCmFo7NJxhD_UsJIveuw-rrfhrvYckv1CJn_10,2301
144
- csv_detective/parsing/columns.py,sha256=zY652tZdFpwnA0vA8nfE1I-1X7kw8NVAeRfblCSYAYE,5631
145
- csv_detective/parsing/compression.py,sha256=Fnw5tj-PpBNI8NYsWj5gD-DUoWcVLnsVpiKm9MpxmIA,350
146
- csv_detective/parsing/csv.py,sha256=11mibDnJhIjykXLGZvA5ZEU5U7KgxIrbyO6BNv6jlro,1626
147
- csv_detective/parsing/excel.py,sha256=AslE2S1e67o8yTIAIhp-lAnJ6-XqeBBRz1-VMFqhZBM,7055
148
- csv_detective/parsing/load.py,sha256=u6fbGFZsL2GwPQRzhAXgt32JpUur7vbQdErREHxNJ-w,3661
149
- csv_detective/parsing/text.py,sha256=_TprGi0gHZlRsafizI3dqQhBehZW4BazqxmypMcAZ-o,1824
150
- csv_detective-0.8.1.dev1416.data/data/share/csv_detective/CHANGELOG.md,sha256=Ar1X9WX1CVoStDzDEOo5O3P0DgRtUUmo70KAYlWLJyQ,8443
151
- csv_detective-0.8.1.dev1416.data/data/share/csv_detective/LICENSE.AGPL.txt,sha256=2N5ReRelkdqkR9a-KP-y-shmcD5P62XoYiG-miLTAzo,34519
152
- csv_detective-0.8.1.dev1416.data/data/share/csv_detective/README.md,sha256=Qr8xRXc-dxQ-tdXCpCTCKp1Uliqq84r0UOlPRNuGCpI,9506
153
- csv_detective-0.8.1.dev1416.dist-info/licenses/LICENSE.AGPL.txt,sha256=2N5ReRelkdqkR9a-KP-y-shmcD5P62XoYiG-miLTAzo,34519
130
+ csv_detective-0.8.1.dev1440.data/data/share/csv_detective/CHANGELOG.md,sha256=b-F0tSnDQUauOqqPJCg57dvlaLt_xsb6J6O88RiiKwY,8603
131
+ csv_detective-0.8.1.dev1440.data/data/share/csv_detective/LICENSE,sha256=A1dQrzxyxRHRih02KwibWj1khQyF7GeA6SqdOU87Gk4,1088
132
+ csv_detective-0.8.1.dev1440.data/data/share/csv_detective/README.md,sha256=gKLFmC8kuCCywS9eAhMak_JNriUWWNOsBKleAu5TIEY,8501
133
+ csv_detective-0.8.1.dev1440.dist-info/licenses/LICENSE,sha256=A1dQrzxyxRHRih02KwibWj1khQyF7GeA6SqdOU87Gk4,1088
154
134
  tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
155
135
  tests/test_example.py,sha256=JeHxSK0IVDcSrOhSZlNGSQv4JAc_r6mzvJM8PfmLTMw,2018
156
136
  tests/test_fields.py,sha256=d2tNvjtal6ZbO646x1GDbp_CGgp-EIcdg2SgMG72J6E,10270
@@ -158,8 +138,8 @@ tests/test_file.py,sha256=9APE1d43lQ8Dk8lwJFNUK_YekYYsQ0ae2_fgpcPE9mk,8116
158
138
  tests/test_labels.py,sha256=Nkr645bUewrj8hjNDKr67FQ6Sy_TID6f3E5Kfkl231M,464
159
139
  tests/test_structure.py,sha256=bv-tjgXohvQAxwmxzH0BynFpK2TyPjcxvtIAmIRlZmA,1393
160
140
  tests/test_validation.py,sha256=CTGonR6htxcWF9WH8MxumDD8cF45Y-G4hm94SM4lFjU,3246
161
- csv_detective-0.8.1.dev1416.dist-info/METADATA,sha256=aCmQVKUNFvJLzTS8DHELQme0GS9jwrHGod4JLWIGt1o,1386
162
- csv_detective-0.8.1.dev1416.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
163
- csv_detective-0.8.1.dev1416.dist-info/entry_points.txt,sha256=JjweTReFqKJmuvkegzlew2j3D5pZzfxvbEGOtGVGmaY,56
164
- csv_detective-0.8.1.dev1416.dist-info/top_level.txt,sha256=M0Nv646VHo-49zWjPkwo2C48UmtfddV8_9mEZeIxy8Q,20
165
- csv_detective-0.8.1.dev1416.dist-info/RECORD,,
141
+ csv_detective-0.8.1.dev1440.dist-info/METADATA,sha256=4ECGBhA77ruP1PeRV0QamjdD1lfKOgoJ_RLJ8iiQ3nA,10443
142
+ csv_detective-0.8.1.dev1440.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
143
+ csv_detective-0.8.1.dev1440.dist-info/entry_points.txt,sha256=JjweTReFqKJmuvkegzlew2j3D5pZzfxvbEGOtGVGmaY,56
144
+ csv_detective-0.8.1.dev1440.dist-info/top_level.txt,sha256=M0Nv646VHo-49zWjPkwo2C48UmtfddV8_9mEZeIxy8Q,20
145
+ csv_detective-0.8.1.dev1440.dist-info/RECORD,,
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 data.gouv.fr
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.