PyPI - csv-detective - Versions diffs - 0.8.1.dev1380__py3-none-any.whl → 0.8.1.dev1440__py3-none-any.whl - Mend

csv-detective 0.8.1.dev1380py3-none-any.whl → 0.8.1.dev1440py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

csv_detective/detect_fields/other/url/__init__.py CHANGED Viewed

@@ -1,13 +1,14 @@
+import re
 PROPORTION = 1
+url_pattern = re.compile(
+    r"^((https?|ftp)://|www\.)(([A-Za-z0-9-]+\.)+[A-Za-z]{2,6})"
+    r"(/[A-Za-z0-9._~:/?#[@!$&'()*+,;=%-]*)?$"
+)
 def _is(val):
-    '''Detects urls'''
+    """Detects urls"""
     if not isinstance(val, str):
         return False
-    a = 'http://' in val
-    b = 'www.' in val
-    c = any([x in val for x in ['.fr', '.com', '.org', '.gouv', '.net']])
-    d = not ('@' in val)
-    return (a or b or c) and d
+    return bool(url_pattern.match(val))

csv_detective/utils.py CHANGED Viewed

@@ -25,6 +25,7 @@ def display_logs_depending_process_time(prompt: str, duration: float):
 def is_url(file_path: str) -> bool:
     # could be more sophisticated if needed
+    # using the URL detection test was considered but too broad (schema required to use requests)
     return file_path.startswith('http')

{csv_detective-0.8.1.dev1380.data → csv_detective-0.8.1.dev1440.data}/data/share/csv_detective/CHANGELOG.md RENAMED Viewed

@@ -3,6 +3,8 @@
 ## Current (in progress)
 - Refactor label testing [#119](https://github.com/datagouv/csv-detective/pull/119)
+- Refactor repo metadata and requirements [#120](https://github.com/datagouv/csv-detective/pull/120) [#122](https://github.com/datagouv/csv-detective/pull/122)
+- Better URL detection [#121](https://github.com/datagouv/csv-detective/pull/121)
 ## 0.8.0 (2025-05-20)

csv_detective-0.8.1.dev1440.data/data/share/csv_detective/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 data.gouv.fr
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

{csv_detective-0.8.1.dev1380.data → csv_detective-0.8.1.dev1440.data}/data/share/csv_detective/README.md RENAMED Viewed

@@ -10,7 +10,7 @@ You can also directly feed the URL of a remote file (from data.gouv.fr for insta
 ### Install the package
-You need to have python >= 3.7 installed. We recommend using a virtual environement.
+You need to have python >= 3.9 installed. We recommend using a virtual environement.
 ```
 pip install csv-detective
@@ -182,7 +182,7 @@ An early version of this analysis of all resources on data.gouv.fr can be found
 ## Release
-The release process uses `bumpr`.
+The release process uses `bumpx`.
 ```shell
 pip install -r requirements-build.txt
@@ -190,16 +190,16 @@ pip install -r requirements-build.txt
 ### Process
-1. `bumpr` will handle bumping the version according to your command (patch, minor, major)
+1. `bumpx` will handle bumping the version according to your command (patch, minor, major)
 2. It will update the CHANGELOG according to the new version being published
 3. It will push a tag with the given version to github
 4. CircleCI will pickup this tag, build the package and publish it to pypi
-5. `bumpr` will have everything ready for the next version (version, changelog...)
+5. `bumpx` will have everything ready for the next version (version, changelog...)
 ### Dry run
 ```shell
-bumpr -d -v
+bumpx -d -v
 ```
 ### Release
@@ -207,38 +207,5 @@ bumpr -d -v
 This will release a patch version:
 ```shell
-bumpr -v
-```
-See bumpr options for minor and major:
-```
-$ bumpr -h
-usage: bumpr [-h] [--version] [-v] [-c CONFIG] [-d] [-st] [-b | -pr] [-M] [-m] [-p]
-             [-s SUFFIX] [-u] [-pM] [-pm] [-pp] [-ps PREPARE_SUFFIX] [-pu]
-             [--vcs {git,hg}] [-nc] [-P] [-nP]
-             [file] [files ...]
-[...]
-optional arguments:
-  -h, --help            show this help message and exit
-  --version             show program's version number and exit
-  -v, --verbose         Verbose output
-  -c CONFIG, --config CONFIG
-                        Specify a configuration file
-  -d, --dryrun          Do not write anything and display a diff
-  -st, --skip-tests     Skip tests
-  -b, --bump            Only perform the bump
-  -pr, --prepare        Only perform the prepare
-bump:
-  -M, --major           Bump major version
-  -m, --minor           Bump minor version
-  -p, --patch           Bump patch version
-  -s SUFFIX, --suffix SUFFIX
-                        Set suffix
-  -u, --unsuffix        Unset suffix
-[...]
+bumpx -v
 ```

csv_detective-0.8.1.dev1440.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,267 @@
+Metadata-Version: 2.4
+Name: csv_detective
+Version: 0.8.1.dev1440
+Summary: Detect tabular files column content
+Home-page: https://github.com/datagouv/csv_detective
+Author: Etalab
+Author-email: opendatateam@data.gouv.fr
+License: https://spdx.org/licenses/MIT.html#licenseText
+Project-URL: Source, https://github.com/datagouv/csv_detective
+Keywords: CSV data processing encoding guess parser tabular
+Classifier: Development Status :: 2 - Pre-Alpha
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: Implementation :: CPython
+Classifier: Topic :: Scientific/Engineering :: Information Analysis
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: boto3<2,>=1.34.0
+Requires-Dist: dateparser<2,>=1.2.0
+Requires-Dist: faust-cchardet==2.1.19
+Requires-Dist: pandas<3,>=2.2.0
+Requires-Dist: python-dateutil<3,>=2.8.2
+Requires-Dist: Unidecode<2,>=1.3.6
+Requires-Dist: openpyxl==3.1.5
+Requires-Dist: xlrd==2.0.1
+Requires-Dist: odfpy==1.4.1
+Requires-Dist: requests<3,>=2.32.3
+Requires-Dist: python-magic==0.4.27
+Requires-Dist: frformat==0.4.0
+Requires-Dist: faker>=33.0.0
+Requires-Dist: rstr==3.2.2
+Provides-Extra: dev
+Requires-Dist: pytest==8.3.0; extra == "dev"
+Requires-Dist: responses==0.25.0; extra == "dev"
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: keywords
+Dynamic: license
+Dynamic: license-file
+Dynamic: project-url
+Dynamic: provides-extra
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# CSV Detective
+This is a package to **automatically detect column content in tabular files**. The script reads either the whole file or the first few rows and performs various checks to see for each column if it matches with various content types. This is currently done through regex and string comparison.
+Currently supported file types: csv, xls, xlsx, ods.
+You can also directly feed the URL of a remote file (from data.gouv.fr for instance).
+## How To ?
+### Install the package
+You need to have python >= 3.9 installed. We recommend using a virtual environement.
+```
+pip install csv-detective
+```
+### Detect some columns
+Say you have a tabular file located at `file_path`. This is how you could use `csv_detective`:
+```
+# Import the csv_detective package
+from csv_detective import routine
+import os # for this example only
+# Replace by your file path
+file_path = os.path.join('.', 'tests', 'code_postaux_v201410.csv')
+# Open your file and run csv_detective
+inspection_results = routine(
+  file_path, # or file URL
+  num_rows=-1, # Value -1 will analyze all lines of your file, you can change with the number of lines you wish to analyze
+  save_results=False, # Default False. If True, it will save result output into the same directory as the analyzed file, using the same name as your file and .json extension
+  output_profile=True, # Default False. If True, returned dict will contain a property "profile" indicating profile (min, max, mean, tops...) of every column of you csv
+  output_schema=True, # Default False. If True, returned dict will contain a property "schema" containing basic [tableschema](https://specs.frictionlessdata.io/table-schema/) of your file. This can be use to validate structure of other csv which should match same structure.
+)
+```
+## So What Do You Get ?
+### Output
+The program creates a `Python` dictionnary with the following information :
+```
+{
+    "encoding": "windows-1252", 			        # Encoding detected
+    "separator": ";",						# Detected CSV separator
+    "header_row_idx": 0					# Index of the header (aka how many lines to skip to get it)
+    "headers": ['code commune INSEE', 'nom de la commune', 'code postal', "libellé d'acheminement"], # Header row
+    "total_lines": 42,					# Number of rows (excluding header)
+    "nb_duplicates": 0,					# Number of exact duplicates in rows
+    "heading_columns": 0,					# Number of heading columns
+    "trailing_columns": 0,					# Number of trailing columns
+    "categorical": ['Code commune']         # Columns that contain less than 25 different values (arbitrary threshold)
+    "columns": { # Property that conciliate detection from labels and content of a column
+        "Code commune": {
+            "python_type": "string",
+            "format": "code_commune_insee",
+            "score": 1.0
+        },
+    },
+    "columns_labels": { # Property that return detection from header columns
+        "Code commune": {
+            "python_type": "string",
+            "format": "code_commune_insee",
+            "score": 0.5
+        },
+    },
+    "columns_fields": { # Property that return detection from content columns
+        "Code commune": {
+            "python_type": "string",
+            "format": "code_commune_insee",
+            "score": 1.25
+        },
+    },
+    "profile": {
+      "column_name" : {
+        "min": 1, # only int and float
+        "max: 12, # only int and float
+        "mean": 5, # only int and float
+        "std": 5, # only int and float
+        "tops": [  # 10 most frequent values in the column
+          "xxx",
+          "yyy",
+          "..."
+        ],
+        "nb_distinct": 67, # number of distinct values
+        "nb_missing_values": 102 # number of empty cells in the column
+      }
+    },
+    "schema": { # TableSchema of the file if `output_schema` was set to `True`
+      "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
+      "name": "",
+      "title": "",
+      "description": "",
+      "countryCode": "FR",
+      "homepage": "",
+      "path": "https://github.com/datagouv/csv-detective",
+      "resources": [],
+      "sources": [
+        {"title": "Spécification Tableschema", "path": "https://specs.frictionlessdata.io/table-schema"},
+        {"title": "schema.data.gouv.fr", "path": "https://schema.data.gouv.fr"}
+      ],
+      "created": "2023-02-10",
+      "lastModified": "2023-02-10",
+      "version": "0.0.1",
+      "contributors": [
+        {"title": "Table schema bot", "email": "schema@data.gouv.fr", "organisation": "data.gouv.fr", "role": "author"}
+      ],
+      "fields": [
+        {
+          "name": "Code commune",
+          "description": "Le code INSEE de la commune",
+          "example": "23150",
+          "type": "string",
+          "formatFR": "code_commune_insee",
+          "constraints": {
+            "required": False,
+            "pattern": "^([013-9]\\d|2[AB1-9])\\d{3}$",
+          }
+        }
+      ]
+    }
+}
+```
+The output slightly differs depending on the file format:
+- csv files have `encoding` and `separator`
+- xls, xls, ods files have `engine` and `sheet_name`
+### What Formats Can Be Detected
+Includes :
+- Communes, Départements, Régions, Pays
+- Codes Communes, Codes Postaux, Codes Departement, ISO Pays
+- Codes CSP, Description CSP, SIREN
+- E-Mails, URLs, Téléphones FR
+- Years, Dates, Jours de la Semaine FR
+- UUIDs, Mongo ObjectIds
+### Format detection and scoring
+For each column, 3 scores are computed for each format, the higher the score, the more likely the format:
+- the field score based on the values contained in the column (0.0 to 1.0).
+- the label score based on the header of the column (0.0 to 1.0).
+- the overall score, computed as `field_score * (1 + label_score/2)` (0.0 to 1.5).
+The overall score computation aims to give more weight to the column contents while
+still leveraging the column header.
+#### `limited_output` - Select the output mode you want for json report
+This option allows you to select the output mode you want to pass. To do so, you have to pass a `limited_output` argument to the `routine` function. This variable has two possible values:
+- `limited_output` defaults to `True` which means report will contain only detected column formats based on a pre-selected threshold proportion in data. Report result is the standard output (an example can be found above in 'Output' section).
+Only the format with highest score is present in the output.
+- `limited_output=False` means report will contain a full list of all column format possibilities for each input data columns with a value associated which match to the proportion of found column type in data. With this report, user can adjust its rules of detection based on a specific threshold and has a better vision of quality detection for each columns. Results could also be easily transformed into a dataframe (columns types in column / column names in rows) for analysis and test.
+## Improvement suggestions
+- Smarter refactors
+- Improve performances
+- Test other ways to load and process data (`pandas` alternatives)
+- Add more and more detection modules...
+Related ideas:
+- store column names to make a learning model based on column names for (possible pre-screen)
+- normalising data based on column prediction
+- entity resolution (good luck...)
+## Why Could This Be of Any Use ?
+Organisations such as [data.gouv.fr](http://data.gouv.fr) aggregate huge amounts of un-normalised data. Performing cross-examination across datasets can be difficult. This tool could help enrich the datasets metadata and facilitate linking them together.
+[`udata-hydra`](https://github.com/etalab/udata-hydra) is a crawler that checks, analyzes (using `csv-detective`) and APIfies all tabular files from [data.gouv.fr](http://data.gouv.fr).
+An early version of this analysis of all resources on data.gouv.fr can be found [here](https://github.com/Leobouloc/data.gouv-exploration).
+## Release
+The release process uses `bumpx`.
+```shell
+pip install -r requirements-build.txt
+```
+### Process
+1. `bumpx` will handle bumping the version according to your command (patch, minor, major)
+2. It will update the CHANGELOG according to the new version being published
+3. It will push a tag with the given version to github
+4. CircleCI will pickup this tag, build the package and publish it to pypi
+5. `bumpx` will have everything ready for the next version (version, changelog...)
+### Dry run
+```shell
+bumpx -d -v
+```
+### Release
+This will release a patch version:
+```shell
+bumpx -v
+```

{csv_detective-0.8.1.dev1380.dist-info → csv_detective-0.8.1.dev1440.dist-info}/RECORD RENAMED Viewed

@@ -3,7 +3,7 @@ csv_detective/cli.py,sha256=itooHtpyfC6DUsL_DchPKe1xo7m0MYJIp1L4R8eqoTk,1401
 csv_detective/explore_csv.py,sha256=IT1-9TbS78p6oeDpQ5T6DQ93xQbobcscyBQb6nh86H4,9082
 csv_detective/load_tests.py,sha256=GILvfkd4OVI-72mA4nzbPlZqgcXZ4wznOhGfZ1ucWkM,2385
 csv_detective/s3_utils.py,sha256=1cIVdQUYY2ovErbMwp72Gqtqx2bkB8nfVhn-QaOFTT0,1451
-csv_detective/utils.py,sha256=CfR4XztO9KdBecjAX0MfclcRgtB1siv4tQrbCAXyOls,927
+csv_detective/utils.py,sha256=-tIs9yV7RJPGj65lQ7LjRGch6Iws9UeuIPQsd2uUUJM,1025
 csv_detective/validate.py,sha256=4e7f8bNXPU9GqNx4QXXiaoINyotozbL52JB6psVAjyY,2631
 csv_detective/detect_fields/__init__.py,sha256=7Tz0Niaz0BboA3YVsp_6WPA6ywciwDN4-lOy_Ie_0Y8,976
 csv_detective/detect_fields/FR/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
@@ -25,15 +25,12 @@ csv_detective/detect_fields/FR/geo/pays/__init__.py,sha256=2q5T4SmCK6ZFF1mrv7d-q
 csv_detective/detect_fields/FR/geo/region/__init__.py,sha256=JbFKDd4jAnd9yb7YqP36MoLdO1JFPm1cg60fGXt6ZvI,1074
 csv_detective/detect_fields/FR/other/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 csv_detective/detect_fields/FR/other/code_csp_insee/__init__.py,sha256=SRWJvg3Ikyjmop9iL14igTjxNGpO-QB3fpADI_bLYEY,566
-csv_detective/detect_fields/FR/other/code_csp_insee/code_csp_insee.txt,sha256=rbcjtMP6qTZ7BTU6ZegkiXKCruqY_m9Ep6ZgRabFS_E,2486
 csv_detective/detect_fields/FR/other/code_import/__init__.py,sha256=zJ9YfPa5p--uHNQFeO1gTjxDy2Um_r-MxQd29VBNjFw,243
 csv_detective/detect_fields/FR/other/code_rna/__init__.py,sha256=Z0RjMBt1--ZL7Jd1RsHAQCCbTAQk_BnlnTq8VF1o_VA,146
 csv_detective/detect_fields/FR/other/code_waldec/__init__.py,sha256=41SYNzCzUFh4trQlwG-9UC0-1Wi4fTcv8Byi_dd9Lq4,168
 csv_detective/detect_fields/FR/other/csp_insee/__init__.py,sha256=lvcaVKgOPrCaZb-Y1-wYCbLYB_CQjCJFNAzfWDwtTVE,496
-csv_detective/detect_fields/FR/other/csp_insee/csp_insee.txt,sha256=kgKaKc-5PHu5U4--ugLjpFyMNtTU9CGdZ9ANU3YAsM4,32879
 csv_detective/detect_fields/FR/other/date_fr/__init__.py,sha256=kMV52djlG0y4o0ELEZuvTv_FvooYOgTnV1aWhycFJDc,284
 csv_detective/detect_fields/FR/other/insee_ape700/__init__.py,sha256=g8pOqJPKVpQiMd78zgrjXJWYeWkYhu8r3D4IQX519HQ,519
-csv_detective/detect_fields/FR/other/insee_ape700/insee_ape700.txt,sha256=nKgslakENwgE7sPkVNHqR23iXuxF02p9-v5MC2_ntx8,4398
 csv_detective/detect_fields/FR/other/sexe/__init__.py,sha256=iYkLe3MM51GWyBX_4BTq5PWDX_EeYRbEHWKMr8oE1MQ,269
 csv_detective/detect_fields/FR/other/siren/__init__.py,sha256=ohSwUL2rXqTXPG5WDAh2SP-lp1SzFCYgo4IhJ-PXmdk,442
 csv_detective/detect_fields/FR/other/siret/__init__.py,sha256=ThEeT6rXmS0EvHW8y4A_74bILyErDGxLe9v3elHOFs8,707
@@ -44,11 +41,8 @@ csv_detective/detect_fields/FR/temp/jour_de_la_semaine/__init__.py,sha256=TRJxFS
 csv_detective/detect_fields/FR/temp/mois_de_annee/__init__.py,sha256=GuOnGw39Kz82bXId8mNzmlC4YkOrrf_F7f4g4uW_uvY,581
 csv_detective/detect_fields/geo/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 csv_detective/detect_fields/geo/iso_country_code_alpha2/__init__.py,sha256=gbuzf_9yytZnmYYABk7vK3WinSU_AnrSxRpNQ7xroa8,433
-csv_detective/detect_fields/geo/iso_country_code_alpha2/iso_country_code_alpha2.txt,sha256=YyPlDqCdz65ecf4Wes_r0P4rDSJG35niXtjc4MmctXM,1740
 csv_detective/detect_fields/geo/iso_country_code_alpha3/__init__.py,sha256=u98rn_wuAGlGh2wN-t5syLBlCkqcxCAjpbvbBN8tov8,409
-csv_detective/detect_fields/geo/iso_country_code_alpha3/iso_country_code_alpha3.txt,sha256=aYqKSohgXuBtcIBfF52f8JWYDdxL_HV_Ol1srGnWBp4,1003
 csv_detective/detect_fields/geo/iso_country_code_numeric/__init__.py,sha256=wJAynAkGZN7jKeI3xOeLXQ_irxQBb_J56pRkLDYVClY,436
-csv_detective/detect_fields/geo/iso_country_code_numeric/iso_country_code_numeric.txt,sha256=2GtEhuporsHYV-pU4q9kfXU5iOtfW5C0GYBTTKQtnnA,1004
 csv_detective/detect_fields/geo/json_geojson/__init__.py,sha256=FPHOfTrfXJs62-NgeOcNGOvwPd7I1fEVp8lTdMNfj3w,433
 csv_detective/detect_fields/geo/latitude_wgs/__init__.py,sha256=ArS6PuYEd0atZwSqNDZhXZz1TwzdiwdV8ovRYTOacpg,327
 csv_detective/detect_fields/geo/latlon_wgs/__init__.py,sha256=7_mnO9uC_kI7e2WR8xIer7Kqw8zi-v-JKaAD4zcoGbE,342
@@ -63,7 +57,7 @@ csv_detective/detect_fields/other/money/__init__.py,sha256=g_ZwBZXl9LhldwFYQotC5
 csv_detective/detect_fields/other/mongo_object_id/__init__.py,sha256=7fcrHsOZAqXp2_N0IjPskYJ_qi4xRlo9iyNNDQVLzsU,156
 csv_detective/detect_fields/other/percent/__init__.py,sha256=vgpekNOPBRuunoVBXMi81rwHv4uSOhe78pbVtQ5SBO8,177
 csv_detective/detect_fields/other/twitter/__init__.py,sha256=qbwLKsTBRFQ4PyTNVeEZ5Hkf5Wwi3ZKclLER_V0YO3g,154
-csv_detective/detect_fields/other/url/__init__.py,sha256=9WaTqCglEsw_lJG_xZsBMdxJXg2yuQ92_fkX6CXWNV0,286
+csv_detective/detect_fields/other/url/__init__.py,sha256=L7h9fZldh1w86XwCx0x3Q1TXSJ_nIId1C-l1yFzZYrA,299
 csv_detective/detect_fields/other/uuid/__init__.py,sha256=3-z0fDax29SJc57zPjNGR6DPICJu6gfuNGC5L3jh4d0,223
 csv_detective/detect_fields/temp/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 csv_detective/detect_fields/temp/date/__init__.py,sha256=1a_Ra9fmT4wgGMrcknXP7eN7A2QiaMF0Yjy0-BMihtA,987
@@ -127,39 +121,25 @@ csv_detective/detect_labels/temp/date/__init__.py,sha256=w0eeZIseAmPwL4OvCWzZXbx
 csv_detective/detect_labels/temp/datetime_iso/__init__.py,sha256=d0laZNzHx-kSARs9Re8TZ11GNs99aMz6gXc72CJ6ul4,440
 csv_detective/detect_labels/temp/datetime_rfc822/__init__.py,sha256=53ysj7QgsxXwG1le3zfSJd1oaTTf-Er3jBeYi_A4F9g,458
 csv_detective/detect_labels/temp/year/__init__.py,sha256=7uWaCZY7dOG7nolW46IgBWmcu8K-9jPED-pOlMlErfo,433
-csv_detective/detection/columns.py,sha256=vfE-DKESA6J9Rfsl-a8tjgZfE21VmzArO5TrbzL0KmE,2905
-csv_detective/detection/encoding.py,sha256=tpjJEMNM_2TcLXDzn1lNQPnSRnsWYjs83tQ8jNwTj4E,973
-csv_detective/detection/engine.py,sha256=HiIrU-l9EO5Fbc2Vh8W_Uy5-dpKcQQzlxCqMuWc09LY,1530
-csv_detective/detection/formats.py,sha256=VwFazRAFJN6eaYUK7IauVU88vuUBHccESY4UD8EgGUo,5386
-csv_detective/detection/headers.py,sha256=wrVII2RQpsVmHhrO1DHf3dmiu8kbtOjBlskf41cnQmc,1172
-csv_detective/detection/rows.py,sha256=3qvsbsBcMxiqqfSYYkOgsRpX777rk22tnRHDwUA97kU,742
-csv_detective/detection/separator.py,sha256=XjeDBqhiBxVfkCPJKem9BAgJqs_hOgQltc_pxrH_-Tg,1547
-csv_detective/detection/variables.py,sha256=3qEMtjZ_zyIFXvTnFgK7ZMDx8C12uQXKfFjEj2moyJc,3558
 csv_detective/output/__init__.py,sha256=5KTevPfp_4MRxByJyOntQjToNfeG7dPQn-_13wSq7EU,1910
 csv_detective/output/dataframe.py,sha256=89iQRE59cHQyQQEsujQVIKP2YAUYpPklWkdDOqZE-wE,2183
 csv_detective/output/example.py,sha256=EdPX1iqHhIG4DsiHuYdy-J7JxOkjgUh_o2D5nrfM5fA,8649
 csv_detective/output/profile.py,sha256=B8YU541T_YPDezJGh4dkHckOShiwHSrZd9GS8jbmz7A,2919
 csv_detective/output/schema.py,sha256=ZDBWDOD8IYp7rcB0_n8l9JXGIhOQ6bTZHFWfTmnNNEQ,13480
 csv_detective/output/utils.py,sha256=HbmvCCCmFo7NJxhD_UsJIveuw-rrfhrvYckv1CJn_10,2301
-csv_detective/parsing/columns.py,sha256=zY652tZdFpwnA0vA8nfE1I-1X7kw8NVAeRfblCSYAYE,5631
-csv_detective/parsing/compression.py,sha256=Fnw5tj-PpBNI8NYsWj5gD-DUoWcVLnsVpiKm9MpxmIA,350
-csv_detective/parsing/csv.py,sha256=11mibDnJhIjykXLGZvA5ZEU5U7KgxIrbyO6BNv6jlro,1626
-csv_detective/parsing/excel.py,sha256=AslE2S1e67o8yTIAIhp-lAnJ6-XqeBBRz1-VMFqhZBM,7055
-csv_detective/parsing/load.py,sha256=u6fbGFZsL2GwPQRzhAXgt32JpUur7vbQdErREHxNJ-w,3661
-csv_detective/parsing/text.py,sha256=_TprGi0gHZlRsafizI3dqQhBehZW4BazqxmypMcAZ-o,1824
-csv_detective-0.8.1.dev1380.data/data/share/csv_detective/CHANGELOG.md,sha256=rPCHesCnCZgVSjdXkzEtDCgkkA__aKmvJWko_SvD4gs,8361
-csv_detective-0.8.1.dev1380.data/data/share/csv_detective/LICENSE.AGPL.txt,sha256=2N5ReRelkdqkR9a-KP-y-shmcD5P62XoYiG-miLTAzo,34519
-csv_detective-0.8.1.dev1380.data/data/share/csv_detective/README.md,sha256=Qr8xRXc-dxQ-tdXCpCTCKp1Uliqq84r0UOlPRNuGCpI,9506
-csv_detective-0.8.1.dev1380.dist-info/licenses/LICENSE.AGPL.txt,sha256=2N5ReRelkdqkR9a-KP-y-shmcD5P62XoYiG-miLTAzo,34519
+csv_detective-0.8.1.dev1440.data/data/share/csv_detective/CHANGELOG.md,sha256=b-F0tSnDQUauOqqPJCg57dvlaLt_xsb6J6O88RiiKwY,8603
+csv_detective-0.8.1.dev1440.data/data/share/csv_detective/LICENSE,sha256=A1dQrzxyxRHRih02KwibWj1khQyF7GeA6SqdOU87Gk4,1088
+csv_detective-0.8.1.dev1440.data/data/share/csv_detective/README.md,sha256=gKLFmC8kuCCywS9eAhMak_JNriUWWNOsBKleAu5TIEY,8501
+csv_detective-0.8.1.dev1440.dist-info/licenses/LICENSE,sha256=A1dQrzxyxRHRih02KwibWj1khQyF7GeA6SqdOU87Gk4,1088
 tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 tests/test_example.py,sha256=JeHxSK0IVDcSrOhSZlNGSQv4JAc_r6mzvJM8PfmLTMw,2018
-tests/test_fields.py,sha256=E6kEsp6_W56WW6FXWUl7hggsJv-vsKuOaJ9JLoFmrUw,9964
+tests/test_fields.py,sha256=d2tNvjtal6ZbO646x1GDbp_CGgp-EIcdg2SgMG72J6E,10270
 tests/test_file.py,sha256=9APE1d43lQ8Dk8lwJFNUK_YekYYsQ0ae2_fgpcPE9mk,8116
 tests/test_labels.py,sha256=Nkr645bUewrj8hjNDKr67FQ6Sy_TID6f3E5Kfkl231M,464
 tests/test_structure.py,sha256=bv-tjgXohvQAxwmxzH0BynFpK2TyPjcxvtIAmIRlZmA,1393
 tests/test_validation.py,sha256=CTGonR6htxcWF9WH8MxumDD8cF45Y-G4hm94SM4lFjU,3246
-csv_detective-0.8.1.dev1380.dist-info/METADATA,sha256=_892qUzBNdUnGSDIZbXDWVSi-3s4OvgGhxsBkizXWYQ,1386
-csv_detective-0.8.1.dev1380.dist-info/WHEEL,sha256=zaaOINJESkSfm_4HQVc5ssNzHCPXhJm0kEUakpsEHaU,91
-csv_detective-0.8.1.dev1380.dist-info/entry_points.txt,sha256=JjweTReFqKJmuvkegzlew2j3D5pZzfxvbEGOtGVGmaY,56
-csv_detective-0.8.1.dev1380.dist-info/top_level.txt,sha256=M0Nv646VHo-49zWjPkwo2C48UmtfddV8_9mEZeIxy8Q,20
-csv_detective-0.8.1.dev1380.dist-info/RECORD,,
+csv_detective-0.8.1.dev1440.dist-info/METADATA,sha256=4ECGBhA77ruP1PeRV0QamjdD1lfKOgoJ_RLJ8iiQ3nA,10443
+csv_detective-0.8.1.dev1440.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+csv_detective-0.8.1.dev1440.dist-info/entry_points.txt,sha256=JjweTReFqKJmuvkegzlew2j3D5pZzfxvbEGOtGVGmaY,56
+csv_detective-0.8.1.dev1440.dist-info/top_level.txt,sha256=M0Nv646VHo-49zWjPkwo2C48UmtfddV8_9mEZeIxy8Q,20
+csv_detective-0.8.1.dev1440.dist-info/RECORD,,

{csv_detective-0.8.1.dev1380.dist-info → csv_detective-0.8.1.dev1440.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: setuptools (80.8.0)
+Generator: setuptools (80.9.0)
 Root-Is-Purelib: true
 Tag: py3-none-any

csv_detective-0.8.1.dev1440.dist-info/licenses/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 data.gouv.fr
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

tests/test_fields.py CHANGED Viewed

@@ -293,8 +293,17 @@ fields = {
         False: ["adresse@mail"],
     },
     url: {
-        True: ["www.etalab.data.gouv.fr"],
-        False: ["une phrase avec un @ dedans"],
+        True: [
+            "www.data.gouv.fr",
+            "http://data.gouv.fr",
+            "https://www.youtube.com/@data-gouv-fr",
+            (
+                "https://tabular-api.data.gouv.fr/api/resources/"
+                "aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/"
+                "?score__greater=0.9&decompte__exact=13"
+            ),
+        ],
+        False: ["tmp@data.gouv.fr"],
     },
     uuid: {
         True: ["884762be-51f3-44c3-b811-1e14c5d89262"],

csv-detective 0.8.1.dev1380__py3-none-any.whl → 0.8.1.dev1440__py3-none-any.whl

csv-detective 0.8.1.dev1380py3-none-any.whl → 0.8.1.dev1440py3-none-any.whl