clustering-imputation 1.0.0__tar.gz → 1.0.1__tar.gz
Sign up to get free protection for your applications and to get access to all the features.
- clustering_imputation-1.0.1/PKG-INFO +63 -0
- clustering_imputation-1.0.1/README.md +46 -0
- clustering_imputation-1.0.1/clustering_imputation.egg-info/PKG-INFO +63 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/setup.py +1 -1
- clustering_imputation-1.0.0/PKG-INFO +0 -18
- clustering_imputation-1.0.0/README.md +0 -1
- clustering_imputation-1.0.0/clustering_imputation.egg-info/PKG-INFO +0 -18
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/LICENSE +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/__init__.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/basicImputer/__init__.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/basicImputer/em.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/basicImputer/mice.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/basicImputer/sice.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/clusterBase/__init__.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/clusterBase/clustering.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/clusterBase/ohe.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/dummyData/__init__.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/dummyData/dataCreation.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/getClusters.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/main.py +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation.egg-info/SOURCES.txt +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation.egg-info/dependency_links.txt +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation.egg-info/requires.txt +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation.egg-info/top_level.txt +0 -0
- {clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/setup.cfg +0 -0
@@ -0,0 +1,63 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: clustering_imputation
|
3
|
+
Version: 1.0.1
|
4
|
+
Summary: Adding correlation to handle MNAR
|
5
|
+
Author: MRINAL KANGSA BANIK
|
6
|
+
Author-email: <manukbanik30@gmail.com>
|
7
|
+
Keywords: python,imputation,MNAR
|
8
|
+
Classifier: Development Status :: 1 - Planning
|
9
|
+
Classifier: Intended Audience :: Developers
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
11
|
+
Classifier: Operating System :: Unix
|
12
|
+
Classifier: Operating System :: MacOS :: MacOS X
|
13
|
+
Classifier: Operating System :: Microsoft :: Windows
|
14
|
+
Description-Content-Type: text/markdown
|
15
|
+
License-File: LICENSE
|
16
|
+
|
17
|
+
|
18
|
+
# Clustering Imputation
|
19
|
+
## Installation
|
20
|
+
To install the package, run:
|
21
|
+
```bash
|
22
|
+
pip install clustering-imputation==1.0.0
|
23
|
+
```
|
24
|
+
## Usage
|
25
|
+
|
26
|
+
```python
|
27
|
+
from clustering_imputation import clusterImputer
|
28
|
+
df = ... # Load your dataset
|
29
|
+
x = clusterImputer(df, "mice", "mean", 0.4, 10)
|
30
|
+
x.impute()
|
31
|
+
```
|
32
|
+
# About the Package
|
33
|
+
|
34
|
+
## Problem Statement
|
35
|
+
|
36
|
+
* Traditional imputation techniques face several challenges:
|
37
|
+
|
38
|
+
* High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.
|
39
|
+
|
40
|
+
* Temporal Dependencies: Current methods often overlook temporal correlations in data.
|
41
|
+
## Need to develop a new algo
|
42
|
+
* Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers "Missing Not at Random" (MNAR).
|
43
|
+
|
44
|
+
* Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.
|
45
|
+
|
46
|
+
## Philosophy of Our Solution: Clustered MICE/EM
|
47
|
+
|
48
|
+
We propose a clustering-based approach:
|
49
|
+
|
50
|
+
* Identify correlations between features.
|
51
|
+
|
52
|
+
* Apply MICE/EM within clusters rather than on the entire dataset.
|
53
|
+
|
54
|
+
* Combine results to reconstruct the dataset.
|
55
|
+
|
56
|
+
* This method effectively handles MNAR data by leveraging feature correlations.
|
57
|
+
For further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)
|
58
|
+
## Contributing
|
59
|
+
|
60
|
+
Pull requests are welcome. For major changes, please open an issue first
|
61
|
+
to discuss what you would like to change.
|
62
|
+
|
63
|
+
Please make sure to update tests as appropriate.
|
@@ -0,0 +1,46 @@
|
|
1
|
+
# Clustering Imputation
|
2
|
+
## Installation
|
3
|
+
To install the package, run:
|
4
|
+
```bash
|
5
|
+
pip install clustering-imputation==1.0.0
|
6
|
+
```
|
7
|
+
## Usage
|
8
|
+
|
9
|
+
```python
|
10
|
+
from clustering_imputation import clusterImputer
|
11
|
+
df = ... # Load your dataset
|
12
|
+
x = clusterImputer(df, "mice", "mean", 0.4, 10)
|
13
|
+
x.impute()
|
14
|
+
```
|
15
|
+
# About the Package
|
16
|
+
|
17
|
+
## Problem Statement
|
18
|
+
|
19
|
+
* Traditional imputation techniques face several challenges:
|
20
|
+
|
21
|
+
* High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.
|
22
|
+
|
23
|
+
* Temporal Dependencies: Current methods often overlook temporal correlations in data.
|
24
|
+
## Need to develop a new algo
|
25
|
+
* Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers "Missing Not at Random" (MNAR).
|
26
|
+
|
27
|
+
* Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.
|
28
|
+
|
29
|
+
## Philosophy of Our Solution: Clustered MICE/EM
|
30
|
+
|
31
|
+
We propose a clustering-based approach:
|
32
|
+
|
33
|
+
* Identify correlations between features.
|
34
|
+
|
35
|
+
* Apply MICE/EM within clusters rather than on the entire dataset.
|
36
|
+
|
37
|
+
* Combine results to reconstruct the dataset.
|
38
|
+
|
39
|
+
* This method effectively handles MNAR data by leveraging feature correlations.
|
40
|
+
For further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)
|
41
|
+
## Contributing
|
42
|
+
|
43
|
+
Pull requests are welcome. For major changes, please open an issue first
|
44
|
+
to discuss what you would like to change.
|
45
|
+
|
46
|
+
Please make sure to update tests as appropriate.
|
@@ -0,0 +1,63 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: clustering-imputation
|
3
|
+
Version: 1.0.1
|
4
|
+
Summary: Adding correlation to handle MNAR
|
5
|
+
Author: MRINAL KANGSA BANIK
|
6
|
+
Author-email: <manukbanik30@gmail.com>
|
7
|
+
Keywords: python,imputation,MNAR
|
8
|
+
Classifier: Development Status :: 1 - Planning
|
9
|
+
Classifier: Intended Audience :: Developers
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
11
|
+
Classifier: Operating System :: Unix
|
12
|
+
Classifier: Operating System :: MacOS :: MacOS X
|
13
|
+
Classifier: Operating System :: Microsoft :: Windows
|
14
|
+
Description-Content-Type: text/markdown
|
15
|
+
License-File: LICENSE
|
16
|
+
|
17
|
+
|
18
|
+
# Clustering Imputation
|
19
|
+
## Installation
|
20
|
+
To install the package, run:
|
21
|
+
```bash
|
22
|
+
pip install clustering-imputation==1.0.0
|
23
|
+
```
|
24
|
+
## Usage
|
25
|
+
|
26
|
+
```python
|
27
|
+
from clustering_imputation import clusterImputer
|
28
|
+
df = ... # Load your dataset
|
29
|
+
x = clusterImputer(df, "mice", "mean", 0.4, 10)
|
30
|
+
x.impute()
|
31
|
+
```
|
32
|
+
# About the Package
|
33
|
+
|
34
|
+
## Problem Statement
|
35
|
+
|
36
|
+
* Traditional imputation techniques face several challenges:
|
37
|
+
|
38
|
+
* High-Dimensional and Sparse Data: Existing methods struggle with large, sparse datasets; efficient techniques for such cases are needed.
|
39
|
+
|
40
|
+
* Temporal Dependencies: Current methods often overlook temporal correlations in data.
|
41
|
+
## Need to develop a new algo
|
42
|
+
* Non-Random Missingness: Few methods address non-random missing patterns; improvements here could boost real-world application accuracy. We aim to develop an imputation method that considers "Missing Not at Random" (MNAR).
|
43
|
+
|
44
|
+
* Computational Complexity: MICE and EM methods are computationally expensive for high-dimensional data. Our approach aims to reduce time complexity.
|
45
|
+
|
46
|
+
## Philosophy of Our Solution: Clustered MICE/EM
|
47
|
+
|
48
|
+
We propose a clustering-based approach:
|
49
|
+
|
50
|
+
* Identify correlations between features.
|
51
|
+
|
52
|
+
* Apply MICE/EM within clusters rather than on the entire dataset.
|
53
|
+
|
54
|
+
* Combine results to reconstruct the dataset.
|
55
|
+
|
56
|
+
* This method effectively handles MNAR data by leveraging feature correlations.
|
57
|
+
For further details refer this [ppt](https://docs.google.com/presentation/d/1UZ2uDkleSgB2ZttjG1D6nmQhqk7uz5FQRW5UmSkB0Sg/edit?usp=sharing)
|
58
|
+
## Contributing
|
59
|
+
|
60
|
+
Pull requests are welcome. For major changes, please open an issue first
|
61
|
+
to discuss what you would like to change.
|
62
|
+
|
63
|
+
Please make sure to update tests as appropriate.
|
@@ -7,7 +7,7 @@ here = os.path.abspath(os.path.dirname(__file__))
|
|
7
7
|
with codecs.open(os.path.join(here, "README.md"), encoding="utf-8") as fh:
|
8
8
|
long_description = "\n" + fh.read()
|
9
9
|
|
10
|
-
VERSION = '1.0.
|
10
|
+
VERSION = '1.0.1'
|
11
11
|
DESCRIPTION = 'Adding correlation to handle MNAR'
|
12
12
|
LONG_DESCRIPTION = 'A package that allows us to impute for all types of missingness(MAR , MCAR , MNAR)'
|
13
13
|
|
@@ -1,18 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: clustering_imputation
|
3
|
-
Version: 1.0.0
|
4
|
-
Summary: Adding correlation to handle MNAR
|
5
|
-
Author: MRINAL KANGSA BANIK
|
6
|
-
Author-email: <manukbanik30@gmail.com>
|
7
|
-
Keywords: python,imputation,MNAR
|
8
|
-
Classifier: Development Status :: 1 - Planning
|
9
|
-
Classifier: Intended Audience :: Developers
|
10
|
-
Classifier: Programming Language :: Python :: 3
|
11
|
-
Classifier: Operating System :: Unix
|
12
|
-
Classifier: Operating System :: MacOS :: MacOS X
|
13
|
-
Classifier: Operating System :: Microsoft :: Windows
|
14
|
-
Description-Content-Type: text/markdown
|
15
|
-
License-File: LICENSE
|
16
|
-
|
17
|
-
|
18
|
-
Hey we will write it later
|
@@ -1 +0,0 @@
|
|
1
|
-
Hey we will write it later
|
@@ -1,18 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: clustering-imputation
|
3
|
-
Version: 1.0.0
|
4
|
-
Summary: Adding correlation to handle MNAR
|
5
|
-
Author: MRINAL KANGSA BANIK
|
6
|
-
Author-email: <manukbanik30@gmail.com>
|
7
|
-
Keywords: python,imputation,MNAR
|
8
|
-
Classifier: Development Status :: 1 - Planning
|
9
|
-
Classifier: Intended Audience :: Developers
|
10
|
-
Classifier: Programming Language :: Python :: 3
|
11
|
-
Classifier: Operating System :: Unix
|
12
|
-
Classifier: Operating System :: MacOS :: MacOS X
|
13
|
-
Classifier: Operating System :: Microsoft :: Windows
|
14
|
-
Description-Content-Type: text/markdown
|
15
|
-
License-File: LICENSE
|
16
|
-
|
17
|
-
|
18
|
-
Hey we will write it later
|
File without changes
|
{clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/__init__.py
RENAMED
File without changes
|
File without changes
|
{clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/basicImputer/em.py
RENAMED
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
{clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/clusterBase/ohe.py
RENAMED
File without changes
|
File without changes
|
File without changes
|
{clustering_imputation-1.0.0 → clustering_imputation-1.0.1}/clustering_imputation/getClusters.py
RENAMED
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|