PyPI - deepcsv - Versions diffs - 0.2.0__tar.gz → 0.4.0__tar.gz - Mend

deepcsv 0.2.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

deepcsv-0.4.0/PKG-INFO +122 -0
deepcsv-0.4.0/README.md +97 -0
deepcsv-0.4.0/deepcsv/deepcsv.py +78 -0
deepcsv-0.4.0/deepcsv.egg-info/PKG-INFO +122 -0
deepcsv-0.4.0/deepcsv.egg-info/requires.txt +2 -0
deepcsv-0.4.0/setup.py +19 -0
deepcsv-0.2.0/PKG-INFO +0 -57
deepcsv-0.2.0/README.md +0 -40
deepcsv-0.2.0/deepcsv/deepcsv.py +0 -49
deepcsv-0.2.0/deepcsv.egg-info/PKG-INFO +0 -57
deepcsv-0.2.0/deepcsv.egg-info/requires.txt +0 -1
deepcsv-0.2.0/setup.py +0 -13
{deepcsv-0.2.0 → deepcsv-0.4.0}/LICENSE +0 -0
{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv/__init__.py +0 -0
{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv.egg-info/SOURCES.txt +0 -0
{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv.egg-info/dependency_links.txt +0 -0
{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv.egg-info/top_level.txt +0 -0
{deepcsv-0.2.0 → deepcsv-0.4.0}/setup.cfg +0 -0

deepcsv-0.4.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,122 @@
+Metadata-Version: 2.4
+Name: deepcsv
+Version: 0.4.0
+Summary: Automatically walks through folders and subfolders, finds all CSV and XLSX files, detects and fixes data issues, and saves the results as Parquet files while keeping the exact same folder structure.
+Home-page: https://github.com/abdubakr77/deepcsv
+Author: Abdullah Bakr
+Author-email: abdubakora1232@gmail.com
+Project-URL: Source, https://github.com/abdubakr77/deepcsv
+Project-URL: Tracker, https://github.com/abdubakr77/deepcsv/issues
+Requires-Python: >=3.7
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pandas
+Requires-Dist: pyarrow
+Dynamic: author
+Dynamic: author-email
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: license-file
+Dynamic: project-url
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# deepcsv
+Stop losing your data types when working with CSV files.
+deepcsv automatically cleans messy CSV/XLSX data and converts it into ML-ready Parquet format.
+## Installation
+```bash
+pip install deepcsv
+```
+## Example
+### Before
+```python
+# CSV column value
+"['a', 'b', 'c']"
+```
+### After
+```python
+# Automatically converted
+['a', 'b', 'c']
+```
+### Usage
+```python
+import deepcsv
+df = deepcsv.ConvertListStrToList("path/to/file.csv")
+```
+---
+## What it does
+- Walks through all folders and subfolders automatically
+- Finds every CSV and XLSX file
+- Detects columns that contain list strings like `"['item1', 'item2']"` and converts them into real Python arrays for faster performance
+- Detects columns with mixed data types and tries to fix them automatically
+- Warns you when a column has mixed types so you know what was changed
+- Saves the results as Parquet files to preserve the converted data types
+---
+## Why Parquet?
+CSV files cannot store arrays or preserve data types.
+Parquet solves this by keeping the exact types after conversion and is much faster for data processing workflows.
+---
+## Why arrays instead of Python lists?
+Arrays are significantly faster for numerical operations and machine learning workflows, especially when working with large datasets.
+---
+## Functions
+### `ConvertListStrToList(file_path)`
+Reads a CSV file, converts list strings to arrays, fixes mixed-type columns, and returns a clean DataFrame.
+```python
+import deepcsv
+df = deepcsv.ConvertListStrToList("path/to/file.csv")
+```
+---
+### `ReadAllCSVData(path)`
+Walks through all folders and subfolders, applies `ConvertListStrToList` on every CSV and XLSX file, and saves the results as Parquet files in a new folder called `All CSV Data is Converted Here`.
+```python
+import deepcsv
+deepcsv.ReadAllCSVData("path/to/folder")
+```
+---
+## Notes
+- Only files that contain list string columns are saved as Parquet
+- Mixed-type columns are converted to float automatically when possible
+- Skips NaN values without breaking
+- Requires `pyarrow` for Parquet support
+---
+## Requirements
+- Python >= 3.7
+- pandas
+- pyarrow

deepcsv-0.4.0/README.md ADDED Viewed

@@ -0,0 +1,97 @@
+# deepcsv
+Stop losing your data types when working with CSV files.
+deepcsv automatically cleans messy CSV/XLSX data and converts it into ML-ready Parquet format.
+## Installation
+```bash
+pip install deepcsv
+```
+## Example
+### Before
+```python
+# CSV column value
+"['a', 'b', 'c']"
+```
+### After
+```python
+# Automatically converted
+['a', 'b', 'c']
+```
+### Usage
+```python
+import deepcsv
+df = deepcsv.ConvertListStrToList("path/to/file.csv")
+```
+---
+## What it does
+- Walks through all folders and subfolders automatically
+- Finds every CSV and XLSX file
+- Detects columns that contain list strings like `"['item1', 'item2']"` and converts them into real Python arrays for faster performance
+- Detects columns with mixed data types and tries to fix them automatically
+- Warns you when a column has mixed types so you know what was changed
+- Saves the results as Parquet files to preserve the converted data types
+---
+## Why Parquet?
+CSV files cannot store arrays or preserve data types.
+Parquet solves this by keeping the exact types after conversion and is much faster for data processing workflows.
+---
+## Why arrays instead of Python lists?
+Arrays are significantly faster for numerical operations and machine learning workflows, especially when working with large datasets.
+---
+## Functions
+### `ConvertListStrToList(file_path)`
+Reads a CSV file, converts list strings to arrays, fixes mixed-type columns, and returns a clean DataFrame.
+```python
+import deepcsv
+df = deepcsv.ConvertListStrToList("path/to/file.csv")
+```
+---
+### `ReadAllCSVData(path)`
+Walks through all folders and subfolders, applies `ConvertListStrToList` on every CSV and XLSX file, and saves the results as Parquet files in a new folder called `All CSV Data is Converted Here`.
+```python
+import deepcsv
+deepcsv.ReadAllCSVData("path/to/folder")
+```
+---
+## Notes
+- Only files that contain list string columns are saved as Parquet
+- Mixed-type columns are converted to float automatically when possible
+- Skips NaN values without breaking
+- Requires `pyarrow` for Parquet support
+---
+## Requirements
+- Python >= 3.7
+- pandas
+- pyarrow

deepcsv-0.4.0/deepcsv/deepcsv.py ADDED Viewed

@@ -0,0 +1,78 @@
+from os import listdir,mkdir,makedirs
+from os.path import join,relpath,dirname,isfile,isdir
+import pandas as pd
+from ast import literal_eval
+from numpy import nan
+import pyarrow
+from warnings import filterwarnings
+filterwarnings("ignore")
+def ConvertListStrToList(File_Path):
+    print(File_Path)
+    print("-"*50)
+    data = pd.read_csv(File_Path)
+    for ColName in data.columns:
+        First_Value = data[ColName].iloc[0]
+        if len(data[ColName].apply(type).unique()) >= 2:
+            sample = (data[data[ColName].apply(type) == str][ColName].head(2)).values
+            if len(sample) > 0 and isinstance(sample[0],str) and sample[0][0].strip().isnumeric():
+                print(f"WARNING:\nThis Dataset Name ({File_Path.split("\\")[-1]}) Found {len(data[ColName].apply(type).unique())} Mixed DataType  in a column called ({ColName})\nPath : {File_Path}")
+                print(f"System : This column have These types: {data[ColName].apply(type).unique()}")
+                print(f"System : Trying to fix the column as a Float to be have only one datatype...")
+                data[ColName] = pd.to_numeric(data[ColName], errors='coerce')
+                print("System : Done!")
+        elif isinstance(First_Value , str) and First_Value.strip().startswith("["):
+            data[f"{ColName.capitalize()}List"] = data[ColName].apply(lambda x : literal_eval(x) if pd.notna(x) else nan)
+            data.drop(ColName,inplace=True,axis=1)
+    return data
+def ReadAllCSVData(WorkDirectoryPath):
+    base_output = join(WorkDirectoryPath, "All CSV Data is Converted Here")
+    all_folders = [WorkDirectoryPath]
+    makedirs(base_output,exist_ok=True)
+    while True:
+        if all_folders:
+            Curr_Path = all_folders.pop(0)
+            for item_name in listdir(Curr_Path):
+                Sub_Item_Path = join(Curr_Path,item_name)
+                if isfile(Sub_Item_Path) and (Sub_Item_Path.endswith(".csv") or Sub_Item_Path.endswith(".xlsx")):
+                    df_converted = ConvertListStrToList(Sub_Item_Path)
+                    df_converted.reset_index(drop=True,inplace=True)
+                    rel_path = relpath(Sub_Item_Path, WorkDirectoryPath)
+                    output = join(base_output,rel_path)
+                    if "List" in df_converted.columns[-1]:
+                        print(Sub_Item_Path)
+                        makedirs(dirname(output),exist_ok=True)
+                        df_converted.to_parquet(output.replace(".csv", ".parquet"))
+                    print("-"*50)
+                elif isdir(Sub_Item_Path):
+                    all_folders.append(Sub_Item_Path)
+        else:
+            break

deepcsv-0.4.0/deepcsv.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,122 @@
+Metadata-Version: 2.4
+Name: deepcsv
+Version: 0.4.0
+Summary: Automatically walks through folders and subfolders, finds all CSV and XLSX files, detects and fixes data issues, and saves the results as Parquet files while keeping the exact same folder structure.
+Home-page: https://github.com/abdubakr77/deepcsv
+Author: Abdullah Bakr
+Author-email: abdubakora1232@gmail.com
+Project-URL: Source, https://github.com/abdubakr77/deepcsv
+Project-URL: Tracker, https://github.com/abdubakr77/deepcsv/issues
+Requires-Python: >=3.7
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pandas
+Requires-Dist: pyarrow
+Dynamic: author
+Dynamic: author-email
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: license-file
+Dynamic: project-url
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# deepcsv
+Stop losing your data types when working with CSV files.
+deepcsv automatically cleans messy CSV/XLSX data and converts it into ML-ready Parquet format.
+## Installation
+```bash
+pip install deepcsv
+```
+## Example
+### Before
+```python
+# CSV column value
+"['a', 'b', 'c']"
+```
+### After
+```python
+# Automatically converted
+['a', 'b', 'c']
+```
+### Usage
+```python
+import deepcsv
+df = deepcsv.ConvertListStrToList("path/to/file.csv")
+```
+---
+## What it does
+- Walks through all folders and subfolders automatically
+- Finds every CSV and XLSX file
+- Detects columns that contain list strings like `"['item1', 'item2']"` and converts them into real Python arrays for faster performance
+- Detects columns with mixed data types and tries to fix them automatically
+- Warns you when a column has mixed types so you know what was changed
+- Saves the results as Parquet files to preserve the converted data types
+---
+## Why Parquet?
+CSV files cannot store arrays or preserve data types.
+Parquet solves this by keeping the exact types after conversion and is much faster for data processing workflows.
+---
+## Why arrays instead of Python lists?
+Arrays are significantly faster for numerical operations and machine learning workflows, especially when working with large datasets.
+---
+## Functions
+### `ConvertListStrToList(file_path)`
+Reads a CSV file, converts list strings to arrays, fixes mixed-type columns, and returns a clean DataFrame.
+```python
+import deepcsv
+df = deepcsv.ConvertListStrToList("path/to/file.csv")
+```
+---
+### `ReadAllCSVData(path)`
+Walks through all folders and subfolders, applies `ConvertListStrToList` on every CSV and XLSX file, and saves the results as Parquet files in a new folder called `All CSV Data is Converted Here`.
+```python
+import deepcsv
+deepcsv.ReadAllCSVData("path/to/folder")
+```
+---
+## Notes
+- Only files that contain list string columns are saved as Parquet
+- Mixed-type columns are converted to float automatically when possible
+- Skips NaN values without breaking
+- Requires `pyarrow` for Parquet support
+---
+## Requirements
+- Python >= 3.7
+- pandas
+- pyarrow

deepcsv-0.4.0/deepcsv.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ pandas
2	+ pyarrow

deepcsv-0.4.0/setup.py ADDED Viewed

@@ -0,0 +1,19 @@
+from setuptools import setup, find_packages
+setup(
+    name="deepcsv",
+    version="0.4.0",
+    author="Abdullah Bakr",
+    author_email="abdubakora1232@gmail.com",
+    description="Automatically walks through folders and subfolders, finds all CSV and XLSX files, detects and fixes data issues, and saves the results as Parquet files while keeping the exact same folder structure.",
+    long_description=open("README.md", encoding="utf-8").read(),
+    long_description_content_type="text/markdown",
+    packages=find_packages(),
+    install_requires=["pandas", "pyarrow"],
+    python_requires=">=3.7",
+    url="https://github.com/abdubakr77/deepcsv",
+    project_urls={
+        "Source": "https://github.com/abdubakr77/deepcsv",
+        "Tracker": "https://github.com/abdubakr77/deepcsv/issues",
+    },
+)

deepcsv-0.2.0/PKG-INFO DELETED Viewed

@@ -1,57 +0,0 @@
-Metadata-Version: 2.4
-Name: deepcsv
-Version: 0.2.0
-Summary: Automatically walks folders and converts list strings to lists in CSV/XLSX files
-Author: Abdullah Bakr
-Requires-Python: >=3.7
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: pandas
-Dynamic: author
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: license-file
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
-# deepcsv
-A Python library that automatically walks through folders and subfolders, finds all CSV and XLSX files, converts list strings into real Python lists, and saves the results in a new folder while keeping the exact same folder structure.
-## Installation
-```bash
-pip install deepcsv
-```
-## Functions
-### `ReadAllCSVData(path)`
-Walks through all folders and subfolders, finds every CSV and XLSX file, converts list strings to real lists, and saves everything in a new folder called `All CSV Data is Converted Here` with the same structure.
-```python
-import deepcsv
-deepcsv.ReadAllCSVData("C:/Users/Data")
-```
-### `ConvertListStrToList(df)`
-Takes a single DataFrame and converts any column that contains list strings into real Python lists. Skips `NaN` values automatically.
-```python
-import deepcsv
-import pandas as pd
-df = pd.read_csv("file.csv")
-df_converted = deepcsv.ConvertListStrToList(df)
-```
-## Notes
-- Supports `.csv` and `.xlsx` files
-- Skips `NaN` values without breaking
-- Keeps the exact folder structure in the output folder
-- Works on any level of nested folders
-## Requirements
-- Python >= 3.7
-- pandas

deepcsv-0.2.0/README.md DELETED Viewed

@@ -1,40 +0,0 @@
-# deepcsv
-A Python library that automatically walks through folders and subfolders, finds all CSV and XLSX files, converts list strings into real Python lists, and saves the results in a new folder while keeping the exact same folder structure.
-## Installation
-```bash
-pip install deepcsv
-```
-## Functions
-### `ReadAllCSVData(path)`
-Walks through all folders and subfolders, finds every CSV and XLSX file, converts list strings to real lists, and saves everything in a new folder called `All CSV Data is Converted Here` with the same structure.
-```python
-import deepcsv
-deepcsv.ReadAllCSVData("C:/Users/Data")
-```
-### `ConvertListStrToList(df)`
-Takes a single DataFrame and converts any column that contains list strings into real Python lists. Skips `NaN` values automatically.
-```python
-import deepcsv
-import pandas as pd
-df = pd.read_csv("file.csv")
-df_converted = deepcsv.ConvertListStrToList(df)
-```
-## Notes
-- Supports `.csv` and `.xlsx` files
-- Skips `NaN` values without breaking
-- Keeps the exact folder structure in the output folder
-- Works on any level of nested folders
-## Requirements
-- Python >= 3.7
-- pandas

deepcsv-0.2.0/deepcsv/deepcsv.py DELETED Viewed

@@ -1,49 +0,0 @@
-import os
-import pandas as pd
-from ast import literal_eval
-def ConvertListStrToList(data):
-    Data_Col = data.columns
-    for ColName in Data_Col:
-        if type(data[ColName][0]) == str and data[ColName][0].startswith("["):
-            data[f"{ColName.capitalize()}List"] = data[ColName].apply(lambda x : literal_eval(x) if pd.notna(x) else x )
-            data.drop(ColName,inplace=True,axis=1)
-    return data
-def ReadAllCSVData(WorkDirectoryPath):
-    base_output = os.path.join(WorkDirectoryPath, "All CSV Data is Converted Here")
-    all_folders = [WorkDirectoryPath]
-    os.makedirs(base_output,exist_ok=True)
-    while True:
-        if all_folders:
-            Curr_Path = all_folders.pop(0)
-            for item_name in os.listdir(Curr_Path):
-                Sub_Item_Path = os.path.join(Curr_Path,item_name)
-                if os.path.isfile(Sub_Item_Path) and (Sub_Item_Path.endswith(".csv") or Sub_Item_Path.endswith(".xlsx")):
-                    print(Sub_Item_Path)
-                    df = pd.read_csv(Sub_Item_Path)
-                    df_converted = ConvertListStrToList(df)
-                    rel_path = os.path.relpath(Sub_Item_Path, WorkDirectoryPath)
-                    output = os.path.join(base_output,rel_path)
-                    os.makedirs(os.path.dirname(output),exist_ok=True)
-                    df_converted.to_csv(output)
-                elif os.path.isdir(Sub_Item_Path):
-                    all_folders.append(Sub_Item_Path)
-        else:
-            break

deepcsv-0.2.0/deepcsv.egg-info/PKG-INFO DELETED Viewed

@@ -1,57 +0,0 @@
-Metadata-Version: 2.4
-Name: deepcsv
-Version: 0.2.0
-Summary: Automatically walks folders and converts list strings to lists in CSV/XLSX files
-Author: Abdullah Bakr
-Requires-Python: >=3.7
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: pandas
-Dynamic: author
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: license-file
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
-# deepcsv
-A Python library that automatically walks through folders and subfolders, finds all CSV and XLSX files, converts list strings into real Python lists, and saves the results in a new folder while keeping the exact same folder structure.
-## Installation
-```bash
-pip install deepcsv
-```
-## Functions
-### `ReadAllCSVData(path)`
-Walks through all folders and subfolders, finds every CSV and XLSX file, converts list strings to real lists, and saves everything in a new folder called `All CSV Data is Converted Here` with the same structure.
-```python
-import deepcsv
-deepcsv.ReadAllCSVData("C:/Users/Data")
-```
-### `ConvertListStrToList(df)`
-Takes a single DataFrame and converts any column that contains list strings into real Python lists. Skips `NaN` values automatically.
-```python
-import deepcsv
-import pandas as pd
-df = pd.read_csv("file.csv")
-df_converted = deepcsv.ConvertListStrToList(df)
-```
-## Notes
-- Supports `.csv` and `.xlsx` files
-- Skips `NaN` values without breaking
-- Keeps the exact folder structure in the output folder
-- Works on any level of nested folders
-## Requirements
-- Python >= 3.7
-- pandas

deepcsv-0.2.0/deepcsv.egg-info/requires.txt DELETED Viewed

	@@ -1 +0,0 @@
1	- pandas

deepcsv-0.2.0/setup.py DELETED Viewed

@@ -1,13 +0,0 @@
-from setuptools import setup, find_packages
-setup(
-    name="deepcsv",
-    version="0.2.0",
-    author="Abdullah Bakr",
-    description="Automatically walks folders and converts list strings to lists in CSV/XLSX files",
-    long_description=open("README.md").read(),
-    long_description_content_type="text/markdown",
-    packages=find_packages(),
-    install_requires=["pandas"],
-    python_requires=">=3.7",
-)

{deepcsv-0.2.0 → deepcsv-0.4.0}/LICENSE RENAMED Viewed

File without changes

{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv/__init__.py RENAMED Viewed

File without changes

{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{deepcsv-0.2.0 → deepcsv-0.4.0}/deepcsv.egg-info/top_level.txt RENAMED Viewed

File without changes

{deepcsv-0.2.0 → deepcsv-0.4.0}/setup.cfg RENAMED Viewed

File without changes

deepcsv 0.2.0__tar.gz → 0.4.0__tar.gz

deepcsv 0.2.0tar.gz → 0.4.0tar.gz