PyPI - csv-detective - Versions diffs - 0.9.3.dev2348__py3-none-any.whl → 0.9.3.dev2361__py3-none-any.whl - Mend

csv-detective 0.9.3.dev2348py3-none-any.whl → 0.9.3.dev2361py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

{csv_detective-0.9.3.dev2348.dist-info → csv_detective-0.9.3.dev2361.dist-info}/METADATA RENAMED Viewed

@@ -1,12 +1,12 @@
 Metadata-Version: 2.4
 Name: csv-detective
-Version: 0.9.3.dev2348
+Version: 0.9.3.dev2361
 Summary: Detect tabular files column content
-Author-email: Etalab <opendatateam@data.gouv.fr>
+Author-email: "data.gouv.fr" <opendatateam@data.gouv.fr>
 License: MIT
 Project-URL: Source, https://github.com/datagouv/csv_detective
 Keywords: CSV,data processing,encoding,guess,parser,tabular
-Requires-Python: <3.14,>=3.10
+Requires-Python: <3.15,>=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: dateparser<2,>=1.2.0
@@ -37,13 +37,13 @@ Currently supported file types: csv(.gz), xls, xlsx, ods.
 You can also directly feed the URL of a remote file (from data.gouv.fr for instance).
-## How To ?
+## How To?
 ### Install the package
-You need to have python >= 3.9 installed. We recommend using a virtual environement.
+You need to have Python >= 3.10 installed. We recommend using a virtual environment.
-```
+```bash
 pip install csv-detective
 ```
@@ -64,8 +64,8 @@ inspection_results = routine(
   file_path, # or file URL
   num_rows=-1, # Value -1 will analyze all lines of your file, you can change with the number of lines you wish to analyze
   save_results=False, # Default False. If True, it will save result output into the same directory as the analyzed file, using the same name as your file and .json extension
-  output_profile=True, # Default False. If True, returned dict will contain a property "profile" indicating profile (min, max, mean, tops...) of every column of you csv
-  output_schema=True, # Default False. If True, returned dict will contain a property "schema" containing basic [tableschema](https://specs.frictionlessdata.io/table-schema/) of your file. This can be use to validate structure of other csv which should match same structure.
+  output_profile=True, # Default False. If True, returned dict will contain a property "profile" indicating profile (min, max, mean, tops...) of every column of your csv
+  output_schema=True, # Default False. If True, returned dict will contain a property "schema" containing basic [tableschema](https://specs.frictionlessdata.io/table-schema/) of your file. This can be used to validate structure of other csv which should match same structure.
   tags=["fr"],  # Default None. If set as a list of strings, only performs checks related to the specified tags (you can see the available tags with FormatsManager().available_tags())
 )
 ```
@@ -74,7 +74,7 @@ inspection_results = routine(
 ### Output
-The program creates a `python` dictionnary with the following information :
+The program creates a `python` dictionary with the following information :
 ```
 {
@@ -111,7 +111,7 @@ The program creates a `python` dictionnary with the following information :
     "profile": {
       "column_name" : {
         "min": 1, # only int and float
-        "max: 12, # only int and float
+        "max": 12, # only int and float
         "mean": 5, # only int and float
         "std": 5, # only int and float
         "tops": [  # 10 most frequent values in the column
@@ -161,11 +161,11 @@ The program creates a `python` dictionnary with the following information :
 The output slightly differs depending on the file format:
 - csv files have `encoding` and `separator` (and `compression` if relevant)
-- xls, xls, ods files have `engine` and `sheet_name`
+- xls, xlsx, ods files have `engine` and `sheet_name`
 You may also set `output_df` to `True`, in which case the output is a tuple of two elements:
 - the analysis (as described above)
-- an iteror of `pd.DataFrame`s which contain the columns cast with the detected types (which can be used with `pd.concat` or in a loop):
+- an iterator of `pd.DataFrame`s which contain the columns cast with the detected types (which can be used with `pd.concat` or in a loop):
 ```python
 inspection, df_chunks = routine(
     file_path=file_path,
@@ -188,7 +188,7 @@ Includes :
 - UUIDs, Mongo ObjectIds
 ### Validation
-If you have a pre-made analysis of a file, you can check whether an other file conforms to the same analysis:
+If you have a pre-made analysis of a file, you can check whether another file conforms to the same analysis:
 ```python
 from csv_detective import validate
 is_valid, *_ = validate(
@@ -226,7 +226,7 @@ Related ideas:
 - store column names to make a learning model based on column names for (possible pre-screen)
 - entity resolution (good luck...)
-## Why Could This Be of Any Use ?
+## Why Could This Be of Any Use?
 Organisations such as [data.gouv.fr](http://data.gouv.fr) aggregate huge amounts of un-normalised data. Performing cross-examination across datasets can be difficult. This tool could help enrich the datasets metadata and facilitate linking them together.
@@ -247,6 +247,8 @@ ruff format .
 The release process uses the [`tag_version.sh`](tag_version.sh) script to create git tags and update [CHANGELOG.md](CHANGELOG.md) and [pyproject.toml](pyproject.toml) automatically.
+**Prerequisites**: [GitHub CLI](https://cli.github.com/) (`gh`) must be installed and authenticated, and you must be on the main branch with a clean working directory.
 ```bash
 # Create a new release
 ./tag_version.sh <version>
@@ -258,11 +260,9 @@ The release process uses the [`tag_version.sh`](tag_version.sh) script to create
 ./tag_version.sh 2.5.0 --dry-run
 ```
-**Prerequisites**: GitHub CLI (`gh`) must be installed and authenticated, and you must be on the main branch with a clean working directory.
 The script automatically:
-- Updates the version in pyproject.toml
-- Extracts commits since the last tag and formats them for CHANGELOG.md
+- Updates the version in `pyproject.toml`
+- Extracts commits since the last tag and formats them for `CHANGELOG.md`
 - Identifies breaking changes (commits with `!:` in the subject)
 - Creates a git tag and pushes it to the remote repository
 - Creates a GitHub release with the changelog content

{csv_detective-0.9.3.dev2348.dist-info → csv_detective-0.9.3.dev2361.dist-info}/RECORD RENAMED Viewed

@@ -85,7 +85,7 @@ csv_detective/parsing/csv.py,sha256=0T0gpaXzwJo-sq41IoLQD704GiMUYeDVVASVbat-zWg,
 csv_detective/parsing/excel.py,sha256=oAVTuoDccJc4-kVjHXiIPLQx3lq3aZRRZQxkG1c06JQ,6992
 csv_detective/parsing/load.py,sha256=f-8aKiNpy_47qg4Lq-UZUR4NNrbJ_-KEGvcUQZ8cmb0,4317
 csv_detective/parsing/text.py,sha256=uz8wfmNTQnOd_4fjrIZ_5rxmFmgrg343hJh2szB73Hc,1770
-csv_detective-0.9.3.dev2348.dist-info/licenses/LICENSE,sha256=A1dQrzxyxRHRih02KwibWj1khQyF7GeA6SqdOU87Gk4,1088
+csv_detective-0.9.3.dev2361.dist-info/licenses/LICENSE,sha256=A1dQrzxyxRHRih02KwibWj1khQyF7GeA6SqdOU87Gk4,1088
 tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 tests/test_example.py,sha256=uTWswvUzBWEADGXZmMAdZvKhKvIjvT5zWOVVABgCDN4,1987
 tests/test_fields.py,sha256=EWHIKwRSdIh74bBSoozYmZBETf7V03JMWpglyxA0ci0,5616
@@ -95,8 +95,8 @@ tests/test_structure.py,sha256=XDbviuuvk-0Mu9Y9PI6He2e5hry2dXVJ6yBVwEqF_2o,1043
 tests/test_validation.py,sha256=9djBT-PDhu_563OFgWyE20o-wPEWEIQGXp6Pjh0_MQM,3463
 venv/bin/activate_this.py,sha256=wS7qPipy8R-dS_0ICD8PqqUQ8F-PrtcpiJw2DUPngYM,1287
 venv/bin/runxlrd.py,sha256=YlZMuycM_V_hzNt2yt3FyXPuwouMCmMhvj1oZaBeeuw,16092
-csv_detective-0.9.3.dev2348.dist-info/METADATA,sha256=0sy4vWAscpleL8quByGyJX5tw0OGkJfX_2lHsOetvy4,11038
-csv_detective-0.9.3.dev2348.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-csv_detective-0.9.3.dev2348.dist-info/entry_points.txt,sha256=JjweTReFqKJmuvkegzlew2j3D5pZzfxvbEGOtGVGmaY,56
-csv_detective-0.9.3.dev2348.dist-info/top_level.txt,sha256=cYKb4Ok3XgYA7rMDOYtxysjSJp_iUA9lJjynhVzue8g,30
-csv_detective-0.9.3.dev2348.dist-info/RECORD,,
+csv_detective-0.9.3.dev2361.dist-info/METADATA,sha256=3dKd3ohteT_4ZHVkJVz672B8K-rVgpCaA_OY09VnsS4,11084
+csv_detective-0.9.3.dev2361.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+csv_detective-0.9.3.dev2361.dist-info/entry_points.txt,sha256=JjweTReFqKJmuvkegzlew2j3D5pZzfxvbEGOtGVGmaY,56
+csv_detective-0.9.3.dev2361.dist-info/top_level.txt,sha256=cYKb4Ok3XgYA7rMDOYtxysjSJp_iUA9lJjynhVzue8g,30
+csv_detective-0.9.3.dev2361.dist-info/RECORD,,

{csv_detective-0.9.3.dev2348.dist-info → csv_detective-0.9.3.dev2361.dist-info}/WHEEL RENAMED Viewed

File without changes

{csv_detective-0.9.3.dev2348.dist-info → csv_detective-0.9.3.dev2361.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{csv_detective-0.9.3.dev2348.dist-info → csv_detective-0.9.3.dev2361.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

{csv_detective-0.9.3.dev2348.dist-info → csv_detective-0.9.3.dev2361.dist-info}/top_level.txt RENAMED Viewed

File without changes

csv-detective 0.9.3.dev2348__py3-none-any.whl → 0.9.3.dev2361__py3-none-any.whl

csv-detective 0.9.3.dev2348py3-none-any.whl → 0.9.3.dev2361py3-none-any.whl