deepcsv 0.6.2b2__tar.gz → 0.6.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. deepcsv-0.6.3/CHANGELOG.md +21 -0
  2. {deepcsv-0.6.2b2/deepcsv.egg-info → deepcsv-0.6.3}/PKG-INFO +37 -18
  3. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/README.md +24 -8
  4. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/deepcsv/deepcsv.py +23 -10
  5. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/deepcsv/utils.py +84 -3
  6. {deepcsv-0.6.2b2 → deepcsv-0.6.3/deepcsv.egg-info}/PKG-INFO +37 -18
  7. deepcsv-0.6.3/deepcsv.egg-info/SOURCES.txt +29 -0
  8. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.5.0-py3-none-any.whl +0 -0
  9. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.5.0.tar.gz +0 -0
  10. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.5.0b1-py3-none-any.whl +0 -0
  11. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.5.0b1.tar.gz +0 -0
  12. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.0-py3-none-any.whl +0 -0
  13. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.0.tar.gz +0 -0
  14. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.1-py3-none-any.whl +0 -0
  15. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.1.tar.gz +0 -0
  16. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.2-py3-none-any.whl +0 -0
  17. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.2.tar.gz +0 -0
  18. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.2b1-py3-none-any.whl +0 -0
  19. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.2b1.tar.gz +0 -0
  20. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.2b2-py3-none-any.whl +0 -0
  21. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.2b2.tar.gz +0 -0
  22. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.3b1-py3-none-any.whl +0 -0
  23. deepcsv-0.6.3/deepcsv.egg-info/dist/deepcsv-0.6.3b1.tar.gz +0 -0
  24. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/setup.py +1 -1
  25. deepcsv-0.6.2b2/CHANGELOG.md +0 -18
  26. deepcsv-0.6.2b2/deepcsv.egg-info/SOURCES.txt +0 -13
  27. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/LICENSE +0 -0
  28. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/MANIFEST.in +0 -0
  29. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/deepcsv/__init__.py +0 -0
  30. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/deepcsv.egg-info/dependency_links.txt +0 -0
  31. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/deepcsv.egg-info/requires.txt +0 -0
  32. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/deepcsv.egg-info/top_level.txt +0 -0
  33. {deepcsv-0.6.2b2 → deepcsv-0.6.3}/setup.cfg +0 -0
@@ -0,0 +1,21 @@
1
+ # Changelog
2
+
3
+ ---
4
+
5
+ ### Added
6
+
7
+ - `process_file` — Added Doc & Examples for new params in function
8
+ - `process_all_files` — Added Doc & Examples for new params in function
9
+ - `process_file` & `process_all_file` — Added New parameter `to_list` to be real python list if you don't need array
10
+ - Added `auto_fix` function for automatic data type correction in DataFrames for mixed Dtypes.
11
+ - Added logg to `auto_fix` to track changes made to columns.
12
+ - Added Documentation for `auto_fix` function to understand more about function.
13
+
14
+ ---
15
+
16
+ ### Changed
17
+
18
+ - `process_file()` — Changed `save_file_extension` parameter to `file_format`
19
+ - `process_all_files()` — Changed `file_extension` parameter to `file_format`
20
+
21
+ ---
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: deepcsv
3
- Version: 0.6.2b2
3
+ Version: 0.6.3
4
4
  Summary: Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!
5
5
  Home-page: https://github.com/abdubakr77/deepcsv
6
6
  Author: Abdullah Bakr
@@ -50,6 +50,7 @@ Your column has numbers — it secretly has 3 different data types.
50
50
  You have 200 CSV files across 40 folders — and you process them one by one.
51
51
  You load a file and spend 20 minutes just picking the right reader.
52
52
  You have nulls scattered everywhere with no clean way to handle them.
53
+ You have alot of mixed DTypes columns no way to handle one by one
53
54
 
54
55
  This is the silent killer of every data pipeline.
55
56
 
@@ -65,7 +66,7 @@ This is the silent killer of every data pipeline.
65
66
  - Catches mixed-type columns and fixes them automatically
66
67
  - Saves everything in any format you choose — not just Parquet
67
68
  - Reads any file format with one function — no more picking the right reader
68
- - Cleans nulls with full control over columns, rows, indexes, values, and types
69
+ - Cleans nulls with full control over columns, rows, indexes, values, and Fix Mixed DTypes
69
70
 
70
71
  ---
71
72
 
@@ -79,7 +80,7 @@ pip install deepcsv
79
80
 
80
81
  ## Functions
81
82
 
82
- ### `process_file(data_input, save_file_extension= str)`
83
+ ### `process_file(data_input, file_format= str, to_list=False)`
83
84
 
84
85
  Reads a file or DataFrame, converts array-like strings to NumPy arrays, fixes mixed-type columns, and optionally saves the result in any format you choose.
85
86
 
@@ -90,17 +91,20 @@ import deepcsv
90
91
  df = deepcsv.process_file('path/to/file.csv')
91
92
 
92
93
  # Process and save as parquet
93
- df = deepcsv.process_file('path/to/file.csv', save_file_extension='parquet')
94
+ df = deepcsv.process_file('path/to/file.csv', file_format='parquet')
94
95
 
95
96
  # Process and save as Excel
96
- df = deepcsv.process_file('path/to/file.csv', save_file_extension='xlsx')
97
+ df = deepcsv.process_file('path/to/file.csv', file_format='xlsx')
98
+
99
+ # Process and convert it to Real Python List
100
+ df = deepcsv.process_file('path/to/file.csv', to_list=True)
97
101
  ```
98
102
 
99
- **Supported save formats:** `.csv` `.tsv` `.txt` `.xlsx` `.json` `.parquet` `.pkl` `.feather` `.html` `.xml`
103
+ **Supported file formats:** `.csv` `.tsv` `.txt` `.xlsx` `.json` `.parquet` `.pkl` `.feather` `.html` `.xml`
100
104
 
101
105
  ---
102
106
 
103
- ### `process_all_files(directory_path, output_dir="All CSV Files is Converted Here", file_extension="parquet")`
107
+ ### `process_all_files(directory_path, output_dir="All CSV Files is Converted Here", file_format="parquet", to_list=False)`
104
108
 
105
109
  Walks through all folders and subfolders, applies `process_file` on every supported file, and saves results in the format you choose.
106
110
 
@@ -121,7 +125,7 @@ deepcsv.process_all_files('path/to/folder', file_extension='csv')
121
125
 
122
126
  ---
123
127
 
124
- ### `read_any(file_path)`
128
+ ### `read_any(file_path)`
125
129
 
126
130
  Reads any supported file format and returns a pandas DataFrame — one function for everything.
127
131
 
@@ -138,7 +142,7 @@ df = read_any('local.db')
138
142
 
139
143
  ---
140
144
 
141
- ### `clean_values(data_input, ...)`
145
+ ### `clean_values(data_input, ...)`
142
146
 
143
147
  Cleans a DataFrame by removing nulls, specific values, specific types, or rows by index — with full control over which columns to target and optional conditions.
144
148
 
@@ -182,6 +186,17 @@ df = clean_values('data.csv', all_cols_except=['id', 'name'])
182
186
 
183
187
  ---
184
188
 
189
+ ### `auto_fix(data_input)`
190
+
191
+ Automatic data type correction in DataFrames for solving mixed Dtypes with logg to track changes made to columns.
192
+
193
+ ```python
194
+ from deepcsv import auto_fix
195
+
196
+ df = auto_fix('My_Data')
197
+ ```
198
+ ---
199
+
185
200
  ## Function Signatures
186
201
 
187
202
  ```python
@@ -189,6 +204,7 @@ process_file(data_input: Union[str, pd.DataFrame], save_file_extension: str = No
189
204
  process_all_files(directory_path: str, output_dir: str = "All CSV Files is Converted Here", file_extension: str = "parquet") -> None
190
205
  read_any(file_path: str) -> pd.DataFrame
191
206
  clean_values(data_input, cols=None, ax_0=False, index=None, condition=None, all_cols_except=None, finding_value=None, finding_type=None) -> pd.DataFrame
207
+ auto_fix(data_input: Union[str, pd.DataFrame])
192
208
  ```
193
209
 
194
210
  ---
@@ -228,16 +244,19 @@ clean_values(data_input, cols=None, ax_0=False, index=None, condition=None, all_
228
244
  ---
229
245
 
230
246
  ### Added
231
- - `process_all_files` — Added option for user to customize the output folder name in
232
- - `read_any()` — Reads any supported file format and returns a pandas DataFrame automatically. Supports: `.csv`, `.txt`, `.tsv`, `.xls`, `.xlsx`, `.json`, `.parquet`, `.pkl`, `.feather`, `.db`, `.sqlite`
233
- - `clean_values()` — Cleans a DataFrame by removing nulls, specific values, specific types, or rows by index. Supports optional condition filtering with 6 operators
234
- - `_validate_cols()` — Internal helper: validates cols is a non-empty list and all columns exist in the DataFrame
235
- - `_validate_index()` Internal helper: validates index is a non-empty list and all indexes exist in the DataFrame. Supports optional `reset_index` before validation
236
- - `_validate_condition()` Internal helper: validates condition list and returns `(operator_func, value)`
237
- - `_parse_operator()` Internal helper: converts operator string like `'>='` into its Python operator function
247
+
248
+ - `process_file` — Added Doc & Examples for new params in function
249
+ - `process_all_files` — Added Doc & Examples for new params in function
250
+ - `process_file` & `process_all_file` Added New parameter `to_list` to be real python list if you don't need array
251
+ - Added `auto_fix` function for automatic data type correction in DataFrames for mixed Dtypes.
252
+ - Added logg to `auto_fix` to track changes made to columns.
253
+ - Added Documentation for `auto_fix` function to understand more about function.
254
+
255
+ ---
238
256
 
239
257
  ### Changed
240
- - `process_file()` — Added `save_file_extension` parameter. Now supports saving the processed DataFrame in any format after conversion, not just returning it
241
- - `process_all_files()` — Added `file_extension` parameter. Now supports saving converted files in any format instead of always saving as Parquet. Also expanded supported input formats beyond `.csv` and `.xlsx` to cover all formats supported by `read_any()`
258
+
259
+ - `process_file()` — Changed `save_file_extension` parameter to `file_format`
260
+ - `process_all_files()` — Changed `file_extension` parameter to `file_format`
242
261
 
243
262
  ---
@@ -14,6 +14,7 @@ Your column has numbers — it secretly has 3 different data types.
14
14
  You have 200 CSV files across 40 folders — and you process them one by one.
15
15
  You load a file and spend 20 minutes just picking the right reader.
16
16
  You have nulls scattered everywhere with no clean way to handle them.
17
+ You have alot of mixed DTypes columns no way to handle one by one
17
18
 
18
19
  This is the silent killer of every data pipeline.
19
20
 
@@ -29,7 +30,7 @@ This is the silent killer of every data pipeline.
29
30
  - Catches mixed-type columns and fixes them automatically
30
31
  - Saves everything in any format you choose — not just Parquet
31
32
  - Reads any file format with one function — no more picking the right reader
32
- - Cleans nulls with full control over columns, rows, indexes, values, and types
33
+ - Cleans nulls with full control over columns, rows, indexes, values, and Fix Mixed DTypes
33
34
 
34
35
  ---
35
36
 
@@ -43,7 +44,7 @@ pip install deepcsv
43
44
 
44
45
  ## Functions
45
46
 
46
- ### `process_file(data_input, save_file_extension= str)`
47
+ ### `process_file(data_input, file_format= str, to_list=False)`
47
48
 
48
49
  Reads a file or DataFrame, converts array-like strings to NumPy arrays, fixes mixed-type columns, and optionally saves the result in any format you choose.
49
50
 
@@ -54,17 +55,20 @@ import deepcsv
54
55
  df = deepcsv.process_file('path/to/file.csv')
55
56
 
56
57
  # Process and save as parquet
57
- df = deepcsv.process_file('path/to/file.csv', save_file_extension='parquet')
58
+ df = deepcsv.process_file('path/to/file.csv', file_format='parquet')
58
59
 
59
60
  # Process and save as Excel
60
- df = deepcsv.process_file('path/to/file.csv', save_file_extension='xlsx')
61
+ df = deepcsv.process_file('path/to/file.csv', file_format='xlsx')
62
+
63
+ # Process and convert it to Real Python List
64
+ df = deepcsv.process_file('path/to/file.csv', to_list=True)
61
65
  ```
62
66
 
63
- **Supported save formats:** `.csv` `.tsv` `.txt` `.xlsx` `.json` `.parquet` `.pkl` `.feather` `.html` `.xml`
67
+ **Supported file formats:** `.csv` `.tsv` `.txt` `.xlsx` `.json` `.parquet` `.pkl` `.feather` `.html` `.xml`
64
68
 
65
69
  ---
66
70
 
67
- ### `process_all_files(directory_path, output_dir="All CSV Files is Converted Here", file_extension="parquet")`
71
+ ### `process_all_files(directory_path, output_dir="All CSV Files is Converted Here", file_format="parquet", to_list=False)`
68
72
 
69
73
  Walks through all folders and subfolders, applies `process_file` on every supported file, and saves results in the format you choose.
70
74
 
@@ -85,7 +89,7 @@ deepcsv.process_all_files('path/to/folder', file_extension='csv')
85
89
 
86
90
  ---
87
91
 
88
- ### `read_any(file_path)`
92
+ ### `read_any(file_path)`
89
93
 
90
94
  Reads any supported file format and returns a pandas DataFrame — one function for everything.
91
95
 
@@ -102,7 +106,7 @@ df = read_any('local.db')
102
106
 
103
107
  ---
104
108
 
105
- ### `clean_values(data_input, ...)`
109
+ ### `clean_values(data_input, ...)`
106
110
 
107
111
  Cleans a DataFrame by removing nulls, specific values, specific types, or rows by index — with full control over which columns to target and optional conditions.
108
112
 
@@ -146,6 +150,17 @@ df = clean_values('data.csv', all_cols_except=['id', 'name'])
146
150
 
147
151
  ---
148
152
 
153
+ ### `auto_fix(data_input)`
154
+
155
+ Automatic data type correction in DataFrames for solving mixed Dtypes with logg to track changes made to columns.
156
+
157
+ ```python
158
+ from deepcsv import auto_fix
159
+
160
+ df = auto_fix('My_Data')
161
+ ```
162
+ ---
163
+
149
164
  ## Function Signatures
150
165
 
151
166
  ```python
@@ -153,6 +168,7 @@ process_file(data_input: Union[str, pd.DataFrame], save_file_extension: str = No
153
168
  process_all_files(directory_path: str, output_dir: str = "All CSV Files is Converted Here", file_extension: str = "parquet") -> None
154
169
  read_any(file_path: str) -> pd.DataFrame
155
170
  clean_values(data_input, cols=None, ax_0=False, index=None, condition=None, all_cols_except=None, finding_value=None, finding_type=None) -> pd.DataFrame
171
+ auto_fix(data_input: Union[str, pd.DataFrame])
156
172
  ```
157
173
 
158
174
  ---
@@ -9,14 +9,18 @@ from os.path import join,relpath,dirname,isfile,isdir
9
9
  from warnings import filterwarnings
10
10
  filterwarnings("ignore")
11
11
 
12
- def process_file(data_input: Union[str, pd.DataFrame] , save_file_extension=str) -> pd.DataFrame:
12
+ def process_file(data_input: Union[str, pd.DataFrame] , file_format= str, to_list = False) -> pd.DataFrame:
13
13
  """
14
14
  Parses string representations of lists in DataFrame columns to actual NumPy arrays.
15
15
 
16
16
  Parameters
17
17
  ----------
18
- data_input : str or pd.DataFrame
19
- Path to the CSV/XLSX file or an existing DataFrame.
18
+ data_input: str or pd.DataFrame
19
+ Path to the CSV/XLSX file or an existing DataFrame.
20
+ file_format: str
21
+ Saves a DataFrame to a file with the specified format.
22
+ to_list: False -> (Array) is better
23
+ True -> it will convert to list
20
24
 
21
25
  Returns
22
26
  -------
@@ -28,6 +32,8 @@ def process_file(data_input: Union[str, pd.DataFrame] , save_file_extension=str)
28
32
  --------
29
33
  >>> df = process_file('path/to/file.csv')
30
34
  >>> df = process_file(my_dataframe)
35
+ >>> df = process_file(my_dataframe , save_format="parquet")
36
+ >>> df = process_file(my_dataframe , save_format="parquet", to_list = True)
31
37
  """
32
38
 
33
39
  try:
@@ -54,16 +60,20 @@ def process_file(data_input: Union[str, pd.DataFrame] , save_file_extension=str)
54
60
  print("System : Done!")
55
61
 
56
62
  elif isinstance(First_Value , str) and First_Value.strip().startswith("["):
57
-
58
- data[f"{ColName.capitalize()}List"] = data[ColName].apply(lambda x : array(literal_eval(x)) if pd.notna(x) else nan)
63
+ if to_list:
64
+ data[f"{ColName.capitalize()}List"] = data[ColName].apply(lambda x : list(literal_eval(x)) if pd.notna(x) else nan)
65
+ else:
66
+ data[f"{ColName.capitalize()}List"] = data[ColName].apply(lambda x : array(literal_eval(x)) if pd.notna(x) else nan)
59
67
  data.drop(ColName,inplace=True,axis=1)
60
68
 
61
- if save_file_extension.strip().lower() in ['csv','txt','tsv','xls','xlsx','json','parquet','pkl','feather','db','sqlite']:
62
- _save_as(data=data,ext=save_file_extension)
69
+ if file_format.strip().lower() in ['csv','txt','tsv','xls','xlsx','json','parquet','pkl','feather','db','sqlite']:
70
+ _save_as(data=data,ext=file_format)
71
+
72
+
63
73
  return data
64
74
 
65
75
 
66
- def process_all_files(directory_path: str, output_dir="All CSV Files is Converted Here",file_extension= "parquet") -> None:
76
+ def process_all_files(directory_path: str, output_dir="All CSV Files is Converted Here",file_format= "parquet",to_list = False) -> None:
67
77
  """
68
78
  Recursively processes all CSV and XLSX files in a directory,
69
79
  converts array strings to NumPy arrays, and saves as Parquet files.
@@ -74,6 +84,8 @@ def process_all_files(directory_path: str, output_dir="All CSV Files is Converte
74
84
  Root directory path to search for CSV/XLSX files.
75
85
  output_dir : str, default 'All CSV Files is Converted Here'
76
86
  Folder name where converted files will be saved.
87
+ file_format: str
88
+ Saves a DataFrame to a file with the specified format for every file.
77
89
 
78
90
  Returns
79
91
  -------
@@ -83,6 +95,7 @@ def process_all_files(directory_path: str, output_dir="All CSV Files is Converte
83
95
  --------
84
96
  >>> process_all_files('/path/to/directory')
85
97
  >>> process_all_files('/path/to/directory', output_dir="Converted Files")
98
+ >>> process_all_files('/path/to/directory', output_dir="Converted Files", file_format="tsv")
86
99
  """
87
100
 
88
101
  base_output = join(directory_path, output_dir)
@@ -113,8 +126,8 @@ def process_all_files(directory_path: str, output_dir="All CSV Files is Converte
113
126
  print(Sub_Item_Path)
114
127
  makedirs(dirname(output),exist_ok=True)
115
128
  _save_as(data=df_converted,
116
- current_dir=output.replace(f".{Sub_Item_Path.split(".")[-1].strip().lower()}", f".{file_extension}"),
117
- ext=file_extension)
129
+ current_dir=output.replace(f".{Sub_Item_Path.split(".")[-1].strip().lower()}", f".{file_format}"),
130
+ ext=file_format,to_list=to_list)
118
131
 
119
132
  elif isdir(Sub_Item_Path):
120
133
 
@@ -110,8 +110,8 @@ def _save_as(data: pd.DataFrame, current_dir = str(Path.cwd()), ext= str) -> Non
110
110
 
111
111
  Examples
112
112
  --------
113
- >>> save_as(df, "data/myfile", ".parquet")
114
- >>> save_as(df, "data/myfile", ".csv")
113
+ >>> _save_as(df, "data/myfile", ".parquet")
114
+ >>> _save_as(df, "data/myfile", ".csv")
115
115
  """
116
116
  ext = ext.strip().lower()
117
117
  if not ext.startswith("."):
@@ -143,6 +143,14 @@ def _save_as(data: pd.DataFrame, current_dir = str(Path.cwd()), ext= str) -> Non
143
143
  print("-"*50)
144
144
 
145
145
 
146
+ def _val_dtype(x,dtype):
147
+ if dtype == str:
148
+ return str(x)
149
+ elif dtype == float:
150
+ return float(x)
151
+ else:
152
+ return bool(x)
153
+
146
154
  # ──────────────────────────────────────────────
147
155
  # PUBLIC FUNCTIONS
148
156
  # ──────────────────────────────────────────────
@@ -299,4 +307,77 @@ def clean_values(data_input: Union[str, pd.DataFrame],
299
307
  else:
300
308
  data.dropna(axis=1, inplace=True)
301
309
 
302
- return data
310
+ return data
311
+
312
+
313
+
314
+ def auto_fix(data_input: Union[str, pd.DataFrame]):
315
+ """
316
+ Automatically detects and fixes columns with mixed data types in a DataFrame.
317
+
318
+ This function scans each column for mixed data types (columns containing exactly
319
+ 2 different Python types) and attempts to convert all values to the most common
320
+ type. If the primary conversion fails, it falls back to the secondary type.
321
+
322
+ Parameters
323
+ ----------
324
+ data_input : str or pd.DataFrame
325
+ File path to read from or an existing DataFrame to process.
326
+
327
+ Returns
328
+ -------
329
+ pd.DataFrame
330
+ DataFrame with mixed-type columns automatically converted to consistent types.
331
+
332
+ Notes
333
+ -----
334
+ - Only processes columns with exactly 2 different data types
335
+ - Attempts conversion to the most frequent type first
336
+ - Falls back to the less frequent type if primary conversion fails
337
+ - Prints progress messages for each column being processed
338
+ - Supported target types: str, float, bool
339
+
340
+ Examples
341
+ --------
342
+ >>> df = auto_fix('data/mixed_types.csv')
343
+ Found a column (price) Have mixed DTypes!
344
+ This Col Have These DTypes: [<class 'str'>, <class 'float'>]
345
+ NOW TRYING TO FIX!
346
+ Done!
347
+ -----------------------------------
348
+
349
+ >>> df = auto_fix(my_dataframe)
350
+ """
351
+
352
+ try:
353
+ df = read_any(data_input)
354
+ except Exception:
355
+ df = data_input
356
+
357
+
358
+ for ColName in df.columns:
359
+ if len(df[ColName].apply(type).unique()) == 2:
360
+ print(f"Found a column ({ColName}) Have mixed DTypes!")
361
+ print(f"This Col Have These DTypes: {df[ColName].apply(type).unique()}\nNOW TRYING TO FIX!")
362
+
363
+ dtype_dict = dict(df[ColName].apply(type).value_counts().to_dict())
364
+ dtype_values_list = [dtype_value for dtype_value in dtype_dict.values()]
365
+ dtype = None
366
+ try:
367
+ for dtype_name in dtype_dict.keys():
368
+ if dtype_dict[dtype_name] == max(dtype_values_list):
369
+ dtype = dtype_name
370
+
371
+
372
+ df[ColName] = df[ColName].apply(lambda x: _val_dtype(x,dtype))
373
+
374
+ except:
375
+ for dtype_name in dtype_dict.keys():
376
+ if dtype_dict[dtype_name] == min(dtype_values_list):
377
+ dtype = dtype_name
378
+
379
+
380
+ df[ColName] = df[ColName].apply(lambda x: _val_dtype(x,dtype))
381
+ print("Done!")
382
+ print("—"*35)
383
+ return df
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: deepcsv
3
- Version: 0.6.2b2
3
+ Version: 0.6.3
4
4
  Summary: Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!
5
5
  Home-page: https://github.com/abdubakr77/deepcsv
6
6
  Author: Abdullah Bakr
@@ -50,6 +50,7 @@ Your column has numbers — it secretly has 3 different data types.
50
50
  You have 200 CSV files across 40 folders — and you process them one by one.
51
51
  You load a file and spend 20 minutes just picking the right reader.
52
52
  You have nulls scattered everywhere with no clean way to handle them.
53
+ You have alot of mixed DTypes columns no way to handle one by one
53
54
 
54
55
  This is the silent killer of every data pipeline.
55
56
 
@@ -65,7 +66,7 @@ This is the silent killer of every data pipeline.
65
66
  - Catches mixed-type columns and fixes them automatically
66
67
  - Saves everything in any format you choose — not just Parquet
67
68
  - Reads any file format with one function — no more picking the right reader
68
- - Cleans nulls with full control over columns, rows, indexes, values, and types
69
+ - Cleans nulls with full control over columns, rows, indexes, values, and Fix Mixed DTypes
69
70
 
70
71
  ---
71
72
 
@@ -79,7 +80,7 @@ pip install deepcsv
79
80
 
80
81
  ## Functions
81
82
 
82
- ### `process_file(data_input, save_file_extension= str)`
83
+ ### `process_file(data_input, file_format= str, to_list=False)`
83
84
 
84
85
  Reads a file or DataFrame, converts array-like strings to NumPy arrays, fixes mixed-type columns, and optionally saves the result in any format you choose.
85
86
 
@@ -90,17 +91,20 @@ import deepcsv
90
91
  df = deepcsv.process_file('path/to/file.csv')
91
92
 
92
93
  # Process and save as parquet
93
- df = deepcsv.process_file('path/to/file.csv', save_file_extension='parquet')
94
+ df = deepcsv.process_file('path/to/file.csv', file_format='parquet')
94
95
 
95
96
  # Process and save as Excel
96
- df = deepcsv.process_file('path/to/file.csv', save_file_extension='xlsx')
97
+ df = deepcsv.process_file('path/to/file.csv', file_format='xlsx')
98
+
99
+ # Process and convert it to Real Python List
100
+ df = deepcsv.process_file('path/to/file.csv', to_list=True)
97
101
  ```
98
102
 
99
- **Supported save formats:** `.csv` `.tsv` `.txt` `.xlsx` `.json` `.parquet` `.pkl` `.feather` `.html` `.xml`
103
+ **Supported file formats:** `.csv` `.tsv` `.txt` `.xlsx` `.json` `.parquet` `.pkl` `.feather` `.html` `.xml`
100
104
 
101
105
  ---
102
106
 
103
- ### `process_all_files(directory_path, output_dir="All CSV Files is Converted Here", file_extension="parquet")`
107
+ ### `process_all_files(directory_path, output_dir="All CSV Files is Converted Here", file_format="parquet", to_list=False)`
104
108
 
105
109
  Walks through all folders and subfolders, applies `process_file` on every supported file, and saves results in the format you choose.
106
110
 
@@ -121,7 +125,7 @@ deepcsv.process_all_files('path/to/folder', file_extension='csv')
121
125
 
122
126
  ---
123
127
 
124
- ### `read_any(file_path)`
128
+ ### `read_any(file_path)`
125
129
 
126
130
  Reads any supported file format and returns a pandas DataFrame — one function for everything.
127
131
 
@@ -138,7 +142,7 @@ df = read_any('local.db')
138
142
 
139
143
  ---
140
144
 
141
- ### `clean_values(data_input, ...)`
145
+ ### `clean_values(data_input, ...)`
142
146
 
143
147
  Cleans a DataFrame by removing nulls, specific values, specific types, or rows by index — with full control over which columns to target and optional conditions.
144
148
 
@@ -182,6 +186,17 @@ df = clean_values('data.csv', all_cols_except=['id', 'name'])
182
186
 
183
187
  ---
184
188
 
189
+ ### `auto_fix(data_input)`
190
+
191
+ Automatic data type correction in DataFrames for solving mixed Dtypes with logg to track changes made to columns.
192
+
193
+ ```python
194
+ from deepcsv import auto_fix
195
+
196
+ df = auto_fix('My_Data')
197
+ ```
198
+ ---
199
+
185
200
  ## Function Signatures
186
201
 
187
202
  ```python
@@ -189,6 +204,7 @@ process_file(data_input: Union[str, pd.DataFrame], save_file_extension: str = No
189
204
  process_all_files(directory_path: str, output_dir: str = "All CSV Files is Converted Here", file_extension: str = "parquet") -> None
190
205
  read_any(file_path: str) -> pd.DataFrame
191
206
  clean_values(data_input, cols=None, ax_0=False, index=None, condition=None, all_cols_except=None, finding_value=None, finding_type=None) -> pd.DataFrame
207
+ auto_fix(data_input: Union[str, pd.DataFrame])
192
208
  ```
193
209
 
194
210
  ---
@@ -228,16 +244,19 @@ clean_values(data_input, cols=None, ax_0=False, index=None, condition=None, all_
228
244
  ---
229
245
 
230
246
  ### Added
231
- - `process_all_files` — Added option for user to customize the output folder name in
232
- - `read_any()` — Reads any supported file format and returns a pandas DataFrame automatically. Supports: `.csv`, `.txt`, `.tsv`, `.xls`, `.xlsx`, `.json`, `.parquet`, `.pkl`, `.feather`, `.db`, `.sqlite`
233
- - `clean_values()` — Cleans a DataFrame by removing nulls, specific values, specific types, or rows by index. Supports optional condition filtering with 6 operators
234
- - `_validate_cols()` — Internal helper: validates cols is a non-empty list and all columns exist in the DataFrame
235
- - `_validate_index()` Internal helper: validates index is a non-empty list and all indexes exist in the DataFrame. Supports optional `reset_index` before validation
236
- - `_validate_condition()` Internal helper: validates condition list and returns `(operator_func, value)`
237
- - `_parse_operator()` Internal helper: converts operator string like `'>='` into its Python operator function
247
+
248
+ - `process_file` — Added Doc & Examples for new params in function
249
+ - `process_all_files` — Added Doc & Examples for new params in function
250
+ - `process_file` & `process_all_file` Added New parameter `to_list` to be real python list if you don't need array
251
+ - Added `auto_fix` function for automatic data type correction in DataFrames for mixed Dtypes.
252
+ - Added logg to `auto_fix` to track changes made to columns.
253
+ - Added Documentation for `auto_fix` function to understand more about function.
254
+
255
+ ---
238
256
 
239
257
  ### Changed
240
- - `process_file()` — Added `save_file_extension` parameter. Now supports saving the processed DataFrame in any format after conversion, not just returning it
241
- - `process_all_files()` — Added `file_extension` parameter. Now supports saving converted files in any format instead of always saving as Parquet. Also expanded supported input formats beyond `.csv` and `.xlsx` to cover all formats supported by `read_any()`
258
+
259
+ - `process_file()` — Changed `save_file_extension` parameter to `file_format`
260
+ - `process_all_files()` — Changed `file_extension` parameter to `file_format`
242
261
 
243
262
  ---
@@ -0,0 +1,29 @@
1
+ CHANGELOG.md
2
+ LICENSE
3
+ MANIFEST.in
4
+ README.md
5
+ setup.py
6
+ deepcsv/__init__.py
7
+ deepcsv/deepcsv.py
8
+ deepcsv/utils.py
9
+ deepcsv.egg-info/PKG-INFO
10
+ deepcsv.egg-info/SOURCES.txt
11
+ deepcsv.egg-info/dependency_links.txt
12
+ deepcsv.egg-info/requires.txt
13
+ deepcsv.egg-info/top_level.txt
14
+ deepcsv.egg-info/dist/deepcsv-0.5.0-py3-none-any.whl
15
+ deepcsv.egg-info/dist/deepcsv-0.5.0.tar.gz
16
+ deepcsv.egg-info/dist/deepcsv-0.5.0b1-py3-none-any.whl
17
+ deepcsv.egg-info/dist/deepcsv-0.5.0b1.tar.gz
18
+ deepcsv.egg-info/dist/deepcsv-0.6.0-py3-none-any.whl
19
+ deepcsv.egg-info/dist/deepcsv-0.6.0.tar.gz
20
+ deepcsv.egg-info/dist/deepcsv-0.6.1-py3-none-any.whl
21
+ deepcsv.egg-info/dist/deepcsv-0.6.1.tar.gz
22
+ deepcsv.egg-info/dist/deepcsv-0.6.2-py3-none-any.whl
23
+ deepcsv.egg-info/dist/deepcsv-0.6.2.tar.gz
24
+ deepcsv.egg-info/dist/deepcsv-0.6.2b1-py3-none-any.whl
25
+ deepcsv.egg-info/dist/deepcsv-0.6.2b1.tar.gz
26
+ deepcsv.egg-info/dist/deepcsv-0.6.2b2-py3-none-any.whl
27
+ deepcsv.egg-info/dist/deepcsv-0.6.2b2.tar.gz
28
+ deepcsv.egg-info/dist/deepcsv-0.6.3b1-py3-none-any.whl
29
+ deepcsv.egg-info/dist/deepcsv-0.6.3b1.tar.gz
@@ -8,7 +8,7 @@ changelog = (this_directory / "CHANGELOG.md").read_text(encoding="utf-8")
8
8
 
9
9
  setup(
10
10
  name="deepcsv",
11
- version="0.6.2b2",
11
+ version="0.6.3",
12
12
  author="Abdullah Bakr",
13
13
  author_email="abdubakora1232@gmail.com",
14
14
  description="Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!",
@@ -1,18 +0,0 @@
1
- # Changelog
2
-
3
- ---
4
-
5
- ### Added
6
- - `process_all_files` — Added option for user to customize the output folder name in
7
- - `read_any()` — Reads any supported file format and returns a pandas DataFrame automatically. Supports: `.csv`, `.txt`, `.tsv`, `.xls`, `.xlsx`, `.json`, `.parquet`, `.pkl`, `.feather`, `.db`, `.sqlite`
8
- - `clean_values()` — Cleans a DataFrame by removing nulls, specific values, specific types, or rows by index. Supports optional condition filtering with 6 operators
9
- - `_validate_cols()` — Internal helper: validates cols is a non-empty list and all columns exist in the DataFrame
10
- - `_validate_index()` — Internal helper: validates index is a non-empty list and all indexes exist in the DataFrame. Supports optional `reset_index` before validation
11
- - `_validate_condition()` — Internal helper: validates condition list and returns `(operator_func, value)`
12
- - `_parse_operator()` — Internal helper: converts operator string like `'>='` into its Python operator function
13
-
14
- ### Changed
15
- - `process_file()` — Added `save_file_extension` parameter. Now supports saving the processed DataFrame in any format after conversion, not just returning it
16
- - `process_all_files()` — Added `file_extension` parameter. Now supports saving converted files in any format instead of always saving as Parquet. Also expanded supported input formats beyond `.csv` and `.xlsx` to cover all formats supported by `read_any()`
17
-
18
- ---
@@ -1,13 +0,0 @@
1
- CHANGELOG.md
2
- LICENSE
3
- MANIFEST.in
4
- README.md
5
- setup.py
6
- deepcsv/__init__.py
7
- deepcsv/deepcsv.py
8
- deepcsv/utils.py
9
- deepcsv.egg-info/PKG-INFO
10
- deepcsv.egg-info/SOURCES.txt
11
- deepcsv.egg-info/dependency_links.txt
12
- deepcsv.egg-info/requires.txt
13
- deepcsv.egg-info/top_level.txt
File without changes
File without changes
File without changes
File without changes