clustering-imputation 1.0.0__py3-none-any.whl → 1.0.1__py3-none-any.whl

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,113 @@
1
+ Metadata-Version: 2.1
2
+ Name: clustering-imputation
3
+ Version: 1.0.1
4
+ Summary: Adding correlation to handle MNAR
5
+ Author: MRINAL KANGSA BANIK
6
+ Author-email: <manukbanik30@gmail.com>
7
+ Keywords: python,imputation,MNAR
8
+ Classifier: Development Status :: 1 - Planning
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Operating System :: Unix
12
+ Classifier: Operating System :: MacOS :: MacOS X
13
+ Classifier: Operating System :: Microsoft :: Windows
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENSE
16
+ Requires-Dist: numpy
17
+ Requires-Dist: pandas
18
+ Requires-Dist: sklearn
19
+ Requires-Dist: fancyimpute
20
+
21
+
22
+ # Clustering Imputation
23
+
24
+ ## Installation
25
+
26
+ To install the package, run:
27
+
28
+ ```bash
29
+
30
+ pip install clustering-imputation==1.0.0
31
+
32
+ ```
33
+
34
+ ## Usage
35
+
36
+
37
+
38
+ ```python
39
+
40
+ from clustering_imputation import clusterImputer
41
+
42
+ df = ... # Load your dataset
43
+
44
+ x = clusterImputer(df, "mice", "mean", 0.4, 10)
45
+
46
+ x.impute()
47
+
48
+ ```
49
+
50
+ # About the Package
51
+
52
+
53
+
54
+ ## Problem Statement
55
+
56
+
57
+
58
+ * Traditional imputation techniques face several challenges:
59
+
60
+
61
+
62
+ * High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.
63
+
64
+
65
+
66
+ * Temporal Dependencies: Current methods often overlook temporal correlations in data.
67
+
68
+ ## Need to develop a new algo
69
+
70
+ * Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers "Missing Not at Random" (MNAR).
71
+
72
+
73
+
74
+ * Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.
75
+
76
+
77
+
78
+ ## Philosophy of Our Solution: Clustered MICE/EM
79
+
80
+
81
+
82
+ We propose a clustering-based approach:
83
+
84
+
85
+
86
+ * Identify correlations between features.
87
+
88
+
89
+
90
+ * Apply MICE/EM within clusters rather than on the entire dataset.
91
+
92
+
93
+
94
+ * Combine results to reconstruct the dataset.
95
+
96
+
97
+
98
+ * This method effectively handles MNAR data by leveraging feature correlations.
99
+
100
+ For further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)
101
+
102
+ ## Contributing
103
+
104
+
105
+
106
+ Pull requests are welcome. For major changes, please open an issue first
107
+
108
+ to discuss what you would like to change.
109
+
110
+
111
+
112
+ Please make sure to update tests as appropriate.
113
+
@@ -10,8 +10,8 @@ clustering_imputation/clusterBase/clustering.py,sha256=vs_btfkRL0wdeVbJqIMJduZvp
10
10
  clustering_imputation/clusterBase/ohe.py,sha256=3KRnDpTNerWYY51m915gMx-UG1GrB1Q2Zu65VR4pOaY,725
11
11
  clustering_imputation/dummyData/__init__.py,sha256=d3J6YLwGNjEQXSKP33oLxFQVS8kYW4wWpvuZMt5_Pm0,30
12
12
  clustering_imputation/dummyData/dataCreation.py,sha256=N2DBcmtqPirZ1i32har4wb9aUJ_OI6GWTKtFYQfXOlw,653
13
- clustering_imputation-1.0.0.dist-info/LICENSE,sha256=-QiwYzJ5Lmyq5xOcOdvNjJM_r9GB-d2RLyULii-1iJw,1097
14
- clustering_imputation-1.0.0.dist-info/METADATA,sha256=g8KMlkF5bXeEn4re5EcgfyT-OhMxWeOTxBFAI_Lsr6w,694
15
- clustering_imputation-1.0.0.dist-info/WHEEL,sha256=yQN5g4mg4AybRjkgi-9yy4iQEFibGQmlz78Pik5Or-A,92
16
- clustering_imputation-1.0.0.dist-info/top_level.txt,sha256=i9XDry3xiyewwukLIRRSWbuRRmG0GpkeIXy3-zaYjAY,22
17
- clustering_imputation-1.0.0.dist-info/RECORD,,
13
+ clustering_imputation-1.0.1.dist-info/LICENSE,sha256=-QiwYzJ5Lmyq5xOcOdvNjJM_r9GB-d2RLyULii-1iJw,1097
14
+ clustering_imputation-1.0.1.dist-info/METADATA,sha256=fas3Uq0-Y1o3V3PyhS4un3G0Xd1bLUoCmS0TEeHkRCk,2474
15
+ clustering_imputation-1.0.1.dist-info/WHEEL,sha256=yQN5g4mg4AybRjkgi-9yy4iQEFibGQmlz78Pik5Or-A,92
16
+ clustering_imputation-1.0.1.dist-info/top_level.txt,sha256=i9XDry3xiyewwukLIRRSWbuRRmG0GpkeIXy3-zaYjAY,22
17
+ clustering_imputation-1.0.1.dist-info/RECORD,,
@@ -1,23 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: clustering-imputation
3
- Version: 1.0.0
4
- Summary: Adding correlation to handle MNAR
5
- Author: MRINAL KANGSA BANIK
6
- Author-email: <manukbanik30@gmail.com>
7
- Keywords: python,imputation,MNAR
8
- Classifier: Development Status :: 1 - Planning
9
- Classifier: Intended Audience :: Developers
10
- Classifier: Programming Language :: Python :: 3
11
- Classifier: Operating System :: Unix
12
- Classifier: Operating System :: MacOS :: MacOS X
13
- Classifier: Operating System :: Microsoft :: Windows
14
- Description-Content-Type: text/markdown
15
- License-File: LICENSE
16
- Requires-Dist: numpy
17
- Requires-Dist: pandas
18
- Requires-Dist: sklearn
19
- Requires-Dist: fancyimpute
20
-
21
-
22
- Hey we will write it later
23
-