psne-poisson-neighbor-python 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,241 @@
1
+ Metadata-Version: 2.1
2
+ Name: psne-poisson-neighbor-python
3
+ Version: 0.1.0
4
+ Summary: p-SNE: Poisson Stochastic Neighbor Embedding. Nonlinear dimensionality reduction for sparse count data (neural spike counts, scRNA-seq, text corpora).
5
+ Author: Noga Mudrik, Adam S. Charles
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding
8
+ Project-URL: Paper, https://arxiv.org/abs/2604.16932
9
+ Project-URL: Issues, https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding/issues
10
+ Keywords: dimensionality-reduction,t-SNE,Poisson,count-data,sparse-data,visualization,embedding,neuroscience,scRNA-seq,spike-counts
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
14
+ Classifier: Topic :: Scientific/Engineering :: Visualization
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Requires-Python: >=3.7
23
+ Description-Content-Type: text/markdown
24
+ Requires-Dist: numpy >=1.21
25
+ Requires-Dist: scipy >=1.7
26
+ Requires-Dist: scikit-learn >=1.0
27
+ Requires-Dist: matplotlib >=3.5
28
+ Requires-Dist: seaborn >=0.11
29
+
30
+ # p-SNE: Poisson Stochastic Neighbor Embedding
31
+
32
+ [![arXiv](https://img.shields.io/badge/arXiv-2604.16932-b31b1b.svg)](https://arxiv.org/abs/2604.16932)
33
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
34
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
35
+
36
+ **A nonlinear dimensionality reduction method for sparse count data.**
37
+
38
+ p-SNE embeds high-dimensional count matrices (neural spike counts, text corpora) into 2D or 3D, using Poisson KL divergence to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. It follows the same API conventions as scikit-learn's t-SNE.
39
+
40
+ 📄 **Paper:** [Neighbor Embedding for High-Dimensional Sparse Poisson Data](https://arxiv.org/abs/2604.16932) (arXiv 2604.16932)
41
+
42
+ 💻 **Code:** [github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding](https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding)
43
+
44
+ 📝 **Blog post:** [Life Is Too Short for Wrong Metrics](https://pub.aimind.so/life-is-too-short-for-wrong-metrics-visualizing-sparse-count-data-with-p-sne-0c6ae0a191c9)
45
+
46
+ ---
47
+
48
+ ## Why p-SNE?
49
+
50
+ Standard dimensionality reduction methods (t-SNE, UMAP, PCA) assume continuous, Gaussian-distributed features. When applied to sparse count data, they treat zeros as informative distances and ignore the mean-variance coupling inherent in Poisson observations. This leads to distorted embeddings where structure is lost or fabricated.
51
+
52
+ p-SNE replaces the Euclidean distance in t-SNE with a Poisson KL divergence that respects the discrete, non-negative nature of count data. On sparse neural recordings, text word counts, and single-cell RNA-seq data, p-SNE recovers cluster structure that t-SNE, UMAP, and PCA miss.
53
+
54
+ ---
55
+
56
+ ## Installation
57
+
58
+ ```bash
59
+ pip install p-sne
60
+ ```
61
+
62
+ Or from source:
63
+
64
+ ```bash
65
+ git clone https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding.git
66
+ cd PSNE-Poisson-Stochastic-Neighbor-Embedding
67
+ pip install -r requirements.txt
68
+ ```
69
+
70
+ Core dependencies: `numpy`, `scipy`, `scikit-learn`, `matplotlib`, `seaborn`.
71
+
72
+ ---
73
+
74
+ ## Quick start
75
+
76
+ ```python
77
+ import numpy as np
78
+ from psne.psne_core import PSNE
79
+
80
+ X = np.random.poisson(5, size=(50, 30)).astype(float)
81
+ model = PSNE(n_components=2, max_iter=500, eta=100.0, verbose=True)
82
+ embedding = model.fit_transform(X)
83
+ ```
84
+
85
+ With your own data:
86
+
87
+ ```python
88
+ import numpy as np
89
+ from psne.psne_core import PSNE
90
+
91
+ X = np.load('my_data.npy').astype(float)
92
+ assert np.all(X >= 0), 'p-SNE requires non-negative input'
93
+
94
+ model = PSNE(
95
+ n_components=3,
96
+ s_mode='weight_exp',
97
+ weight_exp=1.0,
98
+ eta=200.0,
99
+ max_iter=1000,
100
+ gamma=0.0,
101
+ use_momentum=True,
102
+ use_early_exaggeration=True,
103
+ verbose=True,
104
+ )
105
+ embedding = model.fit_transform(X)
106
+ ```
107
+
108
+ Plotting:
109
+
110
+ ```python
111
+ import matplotlib.pyplot as plt
112
+
113
+ labels = np.load('my_labels.npy')
114
+
115
+ fig, ax = plt.subplots()
116
+ ax.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='tab10', s=30)
117
+ ax.set_xlabel('$y_1$')
118
+ ax.set_ylabel('$y_2$')
119
+ plt.show()
120
+ ```
121
+
122
+ For 3D:
123
+
124
+ ```python
125
+ fig = plt.figure()
126
+ ax = fig.add_subplot(111, projection='3d')
127
+ ax.scatter(embedding[:, 0], embedding[:, 1], embedding[:, 2], c=labels, cmap='tab10', s=30)
128
+ plt.show()
129
+ ```
130
+
131
+ ---
132
+
133
+ ## Method
134
+
135
+ 1. **Poisson KL distance matrix.** Asymmetric divergence between all sample pairs:
136
+
137
+ $$D_{ij} = \frac{1}{N}\sum_n \left[ x_{n,i} \log\frac{x_{n,i}+\epsilon}{x_{n,j}+\epsilon} + x_{n,j} - x_{n,i} \right]$$
138
+
139
+ 2. **High-dimensional joint probabilities** $S$: convert $D$ into a symmetric probability matrix via a global weight exponent or adaptive per-point perplexity.
140
+ 3. **Low-dimensional joint probabilities** $Q$: Cauchy kernel over the embedding coordinates, as in t-SNE.
141
+ 4. **Hellinger cost:** minimize $H(S, Q)$ instead of KL divergence.
142
+ 5. **Optional group-lasso penalty:** $\gamma \sum_n \|y_n\|_2$ promotes sparsity across embedding dimensions.
143
+ 6. **Optimizer:** gradient descent with momentum and early exaggeration.
144
+
145
+ ---
146
+
147
+ ## Data format
148
+
149
+ - Shape: $(N, T)$ where $N$ is features (neurons, genes, words) and $T$ is samples (conditions, cells, documents).
150
+ - Type: `float` or `int` numpy array.
151
+ - Values: non-negative.
152
+
153
+ Samples are columns, features are rows. The output embedding has shape `(T, n_components)` with samples as rows. Remove all-zero samples before fitting.
154
+
155
+ ---
156
+
157
+ ## Parameters
158
+
159
+ **Model:**
160
+
161
+ | Parameter | Default | Description |
162
+ |---|---|---|
163
+ | `n_components` | 3 | Embedding dimensionality. |
164
+ | `s_mode` | `'weight_exp'` | How to build $S$: `'weight_exp'` (global) or `'perplexity'` (adaptive). |
165
+ | `weight_exp` | 1.0 | Weight exponent for `s_mode='weight_exp'`. Higher sharpens neighborhoods. |
166
+ | `perplexity` | 30.0 | Target perplexity for `s_mode='perplexity'`. Must be < number of samples. |
167
+ | `epsilon` | 1e-2 | Smoothing constant for Poisson KL. |
168
+ | `gamma` | 0.0 | Group-lasso regularization weight ($\gamma > 0$ enforces sparsity). |
169
+ | `random_state` | 42 | Random seed for initialization. |
170
+
171
+ **Optimizer:**
172
+
173
+ | Parameter | Default | Description |
174
+ |---|---|---|
175
+ | `eta` | 200.0 | Learning rate. |
176
+ | `max_iter` | 1000 | Maximum iterations. |
177
+ | `tol` | 1e-8 | Convergence tolerance on cost change. |
178
+ | `use_momentum` | True | Enable momentum. |
179
+ | `momentum_alpha` | 0.5 | Initial momentum coefficient. |
180
+ | `momentum_alpha_final` | 0.8 | Final momentum coefficient. |
181
+ | `momentum_switch_iter` | 250 | Iteration at which momentum switches. |
182
+ | `use_early_exaggeration` | True | Multiply $S$ by `exaggeration_factor` for the first iterations. |
183
+ | `exaggeration_factor` | 12.0 | Exaggeration multiplier. |
184
+ | `exaggeration_iters` | 250 | Number of exaggeration iterations. |
185
+
186
+ ---
187
+
188
+ ## Attributes (after fitting)
189
+
190
+ | Attribute | Shape | Description |
191
+ |---|---|---|
192
+ | `embedding_` | `(n_components, T)` | Learned embedding. `fit_transform` returns the transpose. |
193
+ | `cost_history_` | list | Total cost at each iteration. |
194
+ | `hellinger_history_` | list | Hellinger distance at each iteration. |
195
+ | `D_` | $(T, T)$ | Poisson KL distance matrix. |
196
+ | `S_` | $(T, T)$ | High-dimensional joint probabilities. |
197
+ | `Q_` | $(T, T)$ | Final low-dimensional joint probabilities. |
198
+ | `n_iter_` | int | Number of iterations run. |
199
+
200
+ ---
201
+
202
+ ## Demo
203
+
204
+ ```bash
205
+ python psne_demo_nonlinear.py
206
+ ```
207
+
208
+ Runs two synthetic datasets (3-group and 4-group XOR), compares p-SNE against baselines (t-SNE, UMAP, PCA, ZIFA, scVI, GLM-PCA, Poisson GPFA), and saves embedding plots, cost curves, and `.npy` files.
209
+
210
+ ---
211
+
212
+ ## File structure
213
+
214
+ ```
215
+ PSNE-Poisson-Stochastic-Neighbor-Embedding/
216
+ ├── psne/
217
+ │ ├── __init__.py
218
+ │ ├── psne_core.py
219
+ │ ├── psne_config.py
220
+ │ └── psne_utils.py
221
+ ├── psne_demo_nonlinear.py
222
+ ├── pyproject.toml
223
+ ├── requirements.txt
224
+ ├── LICENSE
225
+ └── README.md
226
+ ```
227
+
228
+ ---
229
+
230
+ ## Citation
231
+
232
+ If you use p-SNE, please cite:
233
+
234
+ ```bibtex
235
+ @article{mudrik2026neighbor,
236
+ title={Neighbor Embedding for High-Dimensional Sparse Poisson Data},
237
+ author={Mudrik, Noga and Charles, Adam S},
238
+ journal={arXiv preprint arXiv:2604.16932},
239
+ year={2026}
240
+ }
241
+ ```
@@ -0,0 +1,4 @@
1
+ psne_poisson_neighbor_python-0.1.0.dist-info/METADATA,sha256=eIHSaq90_SMXP4r2vTjCY8S76yZ-x6S62uy4_s6EE6o,8726
2
+ psne_poisson_neighbor_python-0.1.0.dist-info/WHEEL,sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM,92
3
+ psne_poisson_neighbor_python-0.1.0.dist-info/top_level.txt,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
4
+ psne_poisson_neighbor_python-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: bdist_wheel (0.42.0)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+