psne-poisson-neighbor-python 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,241 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: psne-poisson-neighbor-python
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: p-SNE: Poisson Stochastic Neighbor Embedding. Nonlinear dimensionality reduction for sparse count data (neural spike counts, scRNA-seq, text corpora).
|
|
5
|
+
Author: Noga Mudrik, Adam S. Charles
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding
|
|
8
|
+
Project-URL: Paper, https://arxiv.org/abs/2604.16932
|
|
9
|
+
Project-URL: Issues, https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding/issues
|
|
10
|
+
Keywords: dimensionality-reduction,t-SNE,Poisson,count-data,sparse-data,visualization,embedding,neuroscience,scRNA-seq,spike-counts
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
14
|
+
Classifier: Topic :: Scientific/Engineering :: Visualization
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Requires-Python: >=3.7
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
Requires-Dist: numpy >=1.21
|
|
25
|
+
Requires-Dist: scipy >=1.7
|
|
26
|
+
Requires-Dist: scikit-learn >=1.0
|
|
27
|
+
Requires-Dist: matplotlib >=3.5
|
|
28
|
+
Requires-Dist: seaborn >=0.11
|
|
29
|
+
|
|
30
|
+
# p-SNE: Poisson Stochastic Neighbor Embedding
|
|
31
|
+
|
|
32
|
+
[](https://arxiv.org/abs/2604.16932)
|
|
33
|
+
[](LICENSE)
|
|
34
|
+
[](https://www.python.org/downloads/)
|
|
35
|
+
|
|
36
|
+
**A nonlinear dimensionality reduction method for sparse count data.**
|
|
37
|
+
|
|
38
|
+
p-SNE embeds high-dimensional count matrices (neural spike counts, text corpora) into 2D or 3D, using Poisson KL divergence to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. It follows the same API conventions as scikit-learn's t-SNE.
|
|
39
|
+
|
|
40
|
+
📄 **Paper:** [Neighbor Embedding for High-Dimensional Sparse Poisson Data](https://arxiv.org/abs/2604.16932) (arXiv 2604.16932)
|
|
41
|
+
|
|
42
|
+
💻 **Code:** [github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding](https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding)
|
|
43
|
+
|
|
44
|
+
📝 **Blog post:** [Life Is Too Short for Wrong Metrics](https://pub.aimind.so/life-is-too-short-for-wrong-metrics-visualizing-sparse-count-data-with-p-sne-0c6ae0a191c9)
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Why p-SNE?
|
|
49
|
+
|
|
50
|
+
Standard dimensionality reduction methods (t-SNE, UMAP, PCA) assume continuous, Gaussian-distributed features. When applied to sparse count data, they treat zeros as informative distances and ignore the mean-variance coupling inherent in Poisson observations. This leads to distorted embeddings where structure is lost or fabricated.
|
|
51
|
+
|
|
52
|
+
p-SNE replaces the Euclidean distance in t-SNE with a Poisson KL divergence that respects the discrete, non-negative nature of count data. On sparse neural recordings, text word counts, and single-cell RNA-seq data, p-SNE recovers cluster structure that t-SNE, UMAP, and PCA miss.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Installation
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
pip install p-sne
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Or from source:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
git clone https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding.git
|
|
66
|
+
cd PSNE-Poisson-Stochastic-Neighbor-Embedding
|
|
67
|
+
pip install -r requirements.txt
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Core dependencies: `numpy`, `scipy`, `scikit-learn`, `matplotlib`, `seaborn`.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Quick start
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
import numpy as np
|
|
78
|
+
from psne.psne_core import PSNE
|
|
79
|
+
|
|
80
|
+
X = np.random.poisson(5, size=(50, 30)).astype(float)
|
|
81
|
+
model = PSNE(n_components=2, max_iter=500, eta=100.0, verbose=True)
|
|
82
|
+
embedding = model.fit_transform(X)
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
With your own data:
|
|
86
|
+
|
|
87
|
+
```python
|
|
88
|
+
import numpy as np
|
|
89
|
+
from psne.psne_core import PSNE
|
|
90
|
+
|
|
91
|
+
X = np.load('my_data.npy').astype(float)
|
|
92
|
+
assert np.all(X >= 0), 'p-SNE requires non-negative input'
|
|
93
|
+
|
|
94
|
+
model = PSNE(
|
|
95
|
+
n_components=3,
|
|
96
|
+
s_mode='weight_exp',
|
|
97
|
+
weight_exp=1.0,
|
|
98
|
+
eta=200.0,
|
|
99
|
+
max_iter=1000,
|
|
100
|
+
gamma=0.0,
|
|
101
|
+
use_momentum=True,
|
|
102
|
+
use_early_exaggeration=True,
|
|
103
|
+
verbose=True,
|
|
104
|
+
)
|
|
105
|
+
embedding = model.fit_transform(X)
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
Plotting:
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
import matplotlib.pyplot as plt
|
|
112
|
+
|
|
113
|
+
labels = np.load('my_labels.npy')
|
|
114
|
+
|
|
115
|
+
fig, ax = plt.subplots()
|
|
116
|
+
ax.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='tab10', s=30)
|
|
117
|
+
ax.set_xlabel('$y_1$')
|
|
118
|
+
ax.set_ylabel('$y_2$')
|
|
119
|
+
plt.show()
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
For 3D:
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
fig = plt.figure()
|
|
126
|
+
ax = fig.add_subplot(111, projection='3d')
|
|
127
|
+
ax.scatter(embedding[:, 0], embedding[:, 1], embedding[:, 2], c=labels, cmap='tab10', s=30)
|
|
128
|
+
plt.show()
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Method
|
|
134
|
+
|
|
135
|
+
1. **Poisson KL distance matrix.** Asymmetric divergence between all sample pairs:
|
|
136
|
+
|
|
137
|
+
$$D_{ij} = \frac{1}{N}\sum_n \left[ x_{n,i} \log\frac{x_{n,i}+\epsilon}{x_{n,j}+\epsilon} + x_{n,j} - x_{n,i} \right]$$
|
|
138
|
+
|
|
139
|
+
2. **High-dimensional joint probabilities** $S$: convert $D$ into a symmetric probability matrix via a global weight exponent or adaptive per-point perplexity.
|
|
140
|
+
3. **Low-dimensional joint probabilities** $Q$: Cauchy kernel over the embedding coordinates, as in t-SNE.
|
|
141
|
+
4. **Hellinger cost:** minimize $H(S, Q)$ instead of KL divergence.
|
|
142
|
+
5. **Optional group-lasso penalty:** $\gamma \sum_n \|y_n\|_2$ promotes sparsity across embedding dimensions.
|
|
143
|
+
6. **Optimizer:** gradient descent with momentum and early exaggeration.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Data format
|
|
148
|
+
|
|
149
|
+
- Shape: $(N, T)$ where $N$ is features (neurons, genes, words) and $T$ is samples (conditions, cells, documents).
|
|
150
|
+
- Type: `float` or `int` numpy array.
|
|
151
|
+
- Values: non-negative.
|
|
152
|
+
|
|
153
|
+
Samples are columns, features are rows. The output embedding has shape `(T, n_components)` with samples as rows. Remove all-zero samples before fitting.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Parameters
|
|
158
|
+
|
|
159
|
+
**Model:**
|
|
160
|
+
|
|
161
|
+
| Parameter | Default | Description |
|
|
162
|
+
|---|---|---|
|
|
163
|
+
| `n_components` | 3 | Embedding dimensionality. |
|
|
164
|
+
| `s_mode` | `'weight_exp'` | How to build $S$: `'weight_exp'` (global) or `'perplexity'` (adaptive). |
|
|
165
|
+
| `weight_exp` | 1.0 | Weight exponent for `s_mode='weight_exp'`. Higher sharpens neighborhoods. |
|
|
166
|
+
| `perplexity` | 30.0 | Target perplexity for `s_mode='perplexity'`. Must be < number of samples. |
|
|
167
|
+
| `epsilon` | 1e-2 | Smoothing constant for Poisson KL. |
|
|
168
|
+
| `gamma` | 0.0 | Group-lasso regularization weight ($\gamma > 0$ enforces sparsity). |
|
|
169
|
+
| `random_state` | 42 | Random seed for initialization. |
|
|
170
|
+
|
|
171
|
+
**Optimizer:**
|
|
172
|
+
|
|
173
|
+
| Parameter | Default | Description |
|
|
174
|
+
|---|---|---|
|
|
175
|
+
| `eta` | 200.0 | Learning rate. |
|
|
176
|
+
| `max_iter` | 1000 | Maximum iterations. |
|
|
177
|
+
| `tol` | 1e-8 | Convergence tolerance on cost change. |
|
|
178
|
+
| `use_momentum` | True | Enable momentum. |
|
|
179
|
+
| `momentum_alpha` | 0.5 | Initial momentum coefficient. |
|
|
180
|
+
| `momentum_alpha_final` | 0.8 | Final momentum coefficient. |
|
|
181
|
+
| `momentum_switch_iter` | 250 | Iteration at which momentum switches. |
|
|
182
|
+
| `use_early_exaggeration` | True | Multiply $S$ by `exaggeration_factor` for the first iterations. |
|
|
183
|
+
| `exaggeration_factor` | 12.0 | Exaggeration multiplier. |
|
|
184
|
+
| `exaggeration_iters` | 250 | Number of exaggeration iterations. |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## Attributes (after fitting)
|
|
189
|
+
|
|
190
|
+
| Attribute | Shape | Description |
|
|
191
|
+
|---|---|---|
|
|
192
|
+
| `embedding_` | `(n_components, T)` | Learned embedding. `fit_transform` returns the transpose. |
|
|
193
|
+
| `cost_history_` | list | Total cost at each iteration. |
|
|
194
|
+
| `hellinger_history_` | list | Hellinger distance at each iteration. |
|
|
195
|
+
| `D_` | $(T, T)$ | Poisson KL distance matrix. |
|
|
196
|
+
| `S_` | $(T, T)$ | High-dimensional joint probabilities. |
|
|
197
|
+
| `Q_` | $(T, T)$ | Final low-dimensional joint probabilities. |
|
|
198
|
+
| `n_iter_` | int | Number of iterations run. |
|
|
199
|
+
|
|
200
|
+
---
|
|
201
|
+
|
|
202
|
+
## Demo
|
|
203
|
+
|
|
204
|
+
```bash
|
|
205
|
+
python psne_demo_nonlinear.py
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
Runs two synthetic datasets (3-group and 4-group XOR), compares p-SNE against baselines (t-SNE, UMAP, PCA, ZIFA, scVI, GLM-PCA, Poisson GPFA), and saves embedding plots, cost curves, and `.npy` files.
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## File structure
|
|
213
|
+
|
|
214
|
+
```
|
|
215
|
+
PSNE-Poisson-Stochastic-Neighbor-Embedding/
|
|
216
|
+
├── psne/
|
|
217
|
+
│ ├── __init__.py
|
|
218
|
+
│ ├── psne_core.py
|
|
219
|
+
│ ├── psne_config.py
|
|
220
|
+
│ └── psne_utils.py
|
|
221
|
+
├── psne_demo_nonlinear.py
|
|
222
|
+
├── pyproject.toml
|
|
223
|
+
├── requirements.txt
|
|
224
|
+
├── LICENSE
|
|
225
|
+
└── README.md
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Citation
|
|
231
|
+
|
|
232
|
+
If you use p-SNE, please cite:
|
|
233
|
+
|
|
234
|
+
```bibtex
|
|
235
|
+
@article{mudrik2026neighbor,
|
|
236
|
+
title={Neighbor Embedding for High-Dimensional Sparse Poisson Data},
|
|
237
|
+
author={Mudrik, Noga and Charles, Adam S},
|
|
238
|
+
journal={arXiv preprint arXiv:2604.16932},
|
|
239
|
+
year={2026}
|
|
240
|
+
}
|
|
241
|
+
```
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
psne_poisson_neighbor_python-0.1.0.dist-info/METADATA,sha256=eIHSaq90_SMXP4r2vTjCY8S76yZ-x6S62uy4_s6EE6o,8726
|
|
2
|
+
psne_poisson_neighbor_python-0.1.0.dist-info/WHEEL,sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM,92
|
|
3
|
+
psne_poisson_neighbor_python-0.1.0.dist-info/top_level.txt,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
|
|
4
|
+
psne_poisson_neighbor_python-0.1.0.dist-info/RECORD,,
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|