clustering-imputation 1.0.0__py3-none-any.whl → 1.0.1__py3-none-any.whl
Sign up to get free protection for your applications and to get access to all the features.
- clustering_imputation-1.0.1.dist-info/METADATA +113 -0
- {clustering_imputation-1.0.0.dist-info → clustering_imputation-1.0.1.dist-info}/RECORD +5 -5
- clustering_imputation-1.0.0.dist-info/METADATA +0 -23
- {clustering_imputation-1.0.0.dist-info → clustering_imputation-1.0.1.dist-info}/LICENSE +0 -0
- {clustering_imputation-1.0.0.dist-info → clustering_imputation-1.0.1.dist-info}/WHEEL +0 -0
- {clustering_imputation-1.0.0.dist-info → clustering_imputation-1.0.1.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,113 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: clustering-imputation
|
3
|
+
Version: 1.0.1
|
4
|
+
Summary: Adding correlation to handle MNAR
|
5
|
+
Author: MRINAL KANGSA BANIK
|
6
|
+
Author-email: <manukbanik30@gmail.com>
|
7
|
+
Keywords: python,imputation,MNAR
|
8
|
+
Classifier: Development Status :: 1 - Planning
|
9
|
+
Classifier: Intended Audience :: Developers
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
11
|
+
Classifier: Operating System :: Unix
|
12
|
+
Classifier: Operating System :: MacOS :: MacOS X
|
13
|
+
Classifier: Operating System :: Microsoft :: Windows
|
14
|
+
Description-Content-Type: text/markdown
|
15
|
+
License-File: LICENSE
|
16
|
+
Requires-Dist: numpy
|
17
|
+
Requires-Dist: pandas
|
18
|
+
Requires-Dist: sklearn
|
19
|
+
Requires-Dist: fancyimpute
|
20
|
+
|
21
|
+
|
22
|
+
# Clustering Imputation
|
23
|
+
|
24
|
+
## Installation
|
25
|
+
|
26
|
+
To install the package, run:
|
27
|
+
|
28
|
+
```bash
|
29
|
+
|
30
|
+
pip install clustering-imputation==1.0.0
|
31
|
+
|
32
|
+
```
|
33
|
+
|
34
|
+
## Usage
|
35
|
+
|
36
|
+
|
37
|
+
|
38
|
+
```python
|
39
|
+
|
40
|
+
from clustering_imputation import clusterImputer
|
41
|
+
|
42
|
+
df = ... # Load your dataset
|
43
|
+
|
44
|
+
x = clusterImputer(df, "mice", "mean", 0.4, 10)
|
45
|
+
|
46
|
+
x.impute()
|
47
|
+
|
48
|
+
```
|
49
|
+
|
50
|
+
# About the Package
|
51
|
+
|
52
|
+
|
53
|
+
|
54
|
+
## Problem Statement
|
55
|
+
|
56
|
+
|
57
|
+
|
58
|
+
* Traditional imputation techniques face several challenges:
|
59
|
+
|
60
|
+
|
61
|
+
|
62
|
+
* High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.
|
63
|
+
|
64
|
+
|
65
|
+
|
66
|
+
* Temporal Dependencies: Current methods often overlook temporal correlations in data.
|
67
|
+
|
68
|
+
## Need to develop a new algo
|
69
|
+
|
70
|
+
* Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers "Missing Not at Random" (MNAR).
|
71
|
+
|
72
|
+
|
73
|
+
|
74
|
+
* Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.
|
75
|
+
|
76
|
+
|
77
|
+
|
78
|
+
## Philosophy of Our Solution: Clustered MICE/EM
|
79
|
+
|
80
|
+
|
81
|
+
|
82
|
+
We propose a clustering-based approach:
|
83
|
+
|
84
|
+
|
85
|
+
|
86
|
+
* Identify correlations between features.
|
87
|
+
|
88
|
+
|
89
|
+
|
90
|
+
* Apply MICE/EM within clusters rather than on the entire dataset.
|
91
|
+
|
92
|
+
|
93
|
+
|
94
|
+
* Combine results to reconstruct the dataset.
|
95
|
+
|
96
|
+
|
97
|
+
|
98
|
+
* This method effectively handles MNAR data by leveraging feature correlations.
|
99
|
+
|
100
|
+
For further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)
|
101
|
+
|
102
|
+
## Contributing
|
103
|
+
|
104
|
+
|
105
|
+
|
106
|
+
Pull requests are welcome. For major changes, please open an issue first
|
107
|
+
|
108
|
+
to discuss what you would like to change.
|
109
|
+
|
110
|
+
|
111
|
+
|
112
|
+
Please make sure to update tests as appropriate.
|
113
|
+
|
@@ -10,8 +10,8 @@ clustering_imputation/clusterBase/clustering.py,sha256=vs_btfkRL0wdeVbJqIMJduZvp
|
|
10
10
|
clustering_imputation/clusterBase/ohe.py,sha256=3KRnDpTNerWYY51m915gMx-UG1GrB1Q2Zu65VR4pOaY,725
|
11
11
|
clustering_imputation/dummyData/__init__.py,sha256=d3J6YLwGNjEQXSKP33oLxFQVS8kYW4wWpvuZMt5_Pm0,30
|
12
12
|
clustering_imputation/dummyData/dataCreation.py,sha256=N2DBcmtqPirZ1i32har4wb9aUJ_OI6GWTKtFYQfXOlw,653
|
13
|
-
clustering_imputation-1.0.
|
14
|
-
clustering_imputation-1.0.
|
15
|
-
clustering_imputation-1.0.
|
16
|
-
clustering_imputation-1.0.
|
17
|
-
clustering_imputation-1.0.
|
13
|
+
clustering_imputation-1.0.1.dist-info/LICENSE,sha256=-QiwYzJ5Lmyq5xOcOdvNjJM_r9GB-d2RLyULii-1iJw,1097
|
14
|
+
clustering_imputation-1.0.1.dist-info/METADATA,sha256=fas3Uq0-Y1o3V3PyhS4un3G0Xd1bLUoCmS0TEeHkRCk,2474
|
15
|
+
clustering_imputation-1.0.1.dist-info/WHEEL,sha256=yQN5g4mg4AybRjkgi-9yy4iQEFibGQmlz78Pik5Or-A,92
|
16
|
+
clustering_imputation-1.0.1.dist-info/top_level.txt,sha256=i9XDry3xiyewwukLIRRSWbuRRmG0GpkeIXy3-zaYjAY,22
|
17
|
+
clustering_imputation-1.0.1.dist-info/RECORD,,
|
@@ -1,23 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: clustering-imputation
|
3
|
-
Version: 1.0.0
|
4
|
-
Summary: Adding correlation to handle MNAR
|
5
|
-
Author: MRINAL KANGSA BANIK
|
6
|
-
Author-email: <manukbanik30@gmail.com>
|
7
|
-
Keywords: python,imputation,MNAR
|
8
|
-
Classifier: Development Status :: 1 - Planning
|
9
|
-
Classifier: Intended Audience :: Developers
|
10
|
-
Classifier: Programming Language :: Python :: 3
|
11
|
-
Classifier: Operating System :: Unix
|
12
|
-
Classifier: Operating System :: MacOS :: MacOS X
|
13
|
-
Classifier: Operating System :: Microsoft :: Windows
|
14
|
-
Description-Content-Type: text/markdown
|
15
|
-
License-File: LICENSE
|
16
|
-
Requires-Dist: numpy
|
17
|
-
Requires-Dist: pandas
|
18
|
-
Requires-Dist: sklearn
|
19
|
-
Requires-Dist: fancyimpute
|
20
|
-
|
21
|
-
|
22
|
-
Hey we will write it later
|
23
|
-
|
File without changes
|
File without changes
|
{clustering_imputation-1.0.0.dist-info → clustering_imputation-1.0.1.dist-info}/top_level.txt
RENAMED
File without changes
|