mcDETECT 2.0.15__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Chenyang Yuan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,40 @@
1
+ Metadata-Version: 2.4
2
+ Name: mcDETECT
3
+ Version: 2.0.15
4
+ Summary: Uncovering the dark transcriptome in polarized neuronal compartments with mcDETECT
5
+ Home-page: https://github.com/chen-yang-yuan/mcDETECT
6
+ Author: Chenyang Yuan
7
+ Author-email: chenyang.yuan@emory.edu
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.6
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: anndata
15
+ Requires-Dist: miniball
16
+ Requires-Dist: numpy
17
+ Requires-Dist: pandas
18
+ Requires-Dist: rtree
19
+ Requires-Dist: scanpy
20
+ Requires-Dist: scikit-learn
21
+ Requires-Dist: scipy
22
+ Requires-Dist: shapely
23
+ Dynamic: author
24
+ Dynamic: author-email
25
+ Dynamic: classifier
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: home-page
29
+ Dynamic: license-file
30
+ Dynamic: requires-dist
31
+ Dynamic: requires-python
32
+ Dynamic: summary
33
+
34
+ # mcDETECT
35
+
36
+ ## Uncovering the dark transcriptome in polarized neuronal compartments with mcDETECT
37
+
38
+ #### Chenyang Yuan, Krupa Patel, Hongshun Shi, Hsiao-Lin V. Wang, Feng Wang, Ronghua Li, Yangping Li, Victor G. Corces, Hailing Shi, Sulagna Das, Jindan Yu, Peng Jin, Bing Yao* and Jian Hu*
39
+
40
+ mcDETECT is a computational framework designed to study the dark transcriptome related to polarized compartments in brain using *in situ* spatial transcriptomics (iST) data. It begins by examining the subcellular distribution of mRNAs in an iST sample. Each mRNA molecule is treated as a distinct point with its own 3D spatial coordinates considering the thickness of the sample. Unlike many cell-type marker genes, which are typically found within the nucleus or soma, compartmentalized mRNAs often form small aggregates outside the soma. mcDETECT uses a density-based clustering approach to identify these extrasomatic aggregates. This involves calculating the Euclidean distance between mRNA points and defining the neighborhood of each point within a specified search radius. Points are then categorized as core points, border points, or noise points based on their reachability from neighboring points. mcDETECT recognizes each connected bundle of core and border points as a mRNA aggregate. To minimize false positives, it excludes aggregates that substantially overlap with somata, which are estimated by dilating the nuclear masks derived from DAPI staining. mcDETECT then repeats this process for multiple granule markers, merging aggregates from different markers that exhibit high spatial overlap. After aggregating across all markers, an additional filtering step removes aggregates containing mRNAs from negative control genes, which are known to be enriched exclusively in nuclei and somata. The remaining aggregates are considered individual RNA granules. mcDETECT then computes the minimum enclosing sphere for each aggregate to connect neighboring mRNA molecules from all measured genes and summarizes their counts, thereby defining the spatial transcriptome profile of individual RNA granules.
@@ -0,0 +1,7 @@
1
+ # mcDETECT
2
+
3
+ ## Uncovering the dark transcriptome in polarized neuronal compartments with mcDETECT
4
+
5
+ #### Chenyang Yuan, Krupa Patel, Hongshun Shi, Hsiao-Lin V. Wang, Feng Wang, Ronghua Li, Yangping Li, Victor G. Corces, Hailing Shi, Sulagna Das, Jindan Yu, Peng Jin, Bing Yao* and Jian Hu*
6
+
7
+ mcDETECT is a computational framework designed to study the dark transcriptome related to polarized compartments in brain using *in situ* spatial transcriptomics (iST) data. It begins by examining the subcellular distribution of mRNAs in an iST sample. Each mRNA molecule is treated as a distinct point with its own 3D spatial coordinates considering the thickness of the sample. Unlike many cell-type marker genes, which are typically found within the nucleus or soma, compartmentalized mRNAs often form small aggregates outside the soma. mcDETECT uses a density-based clustering approach to identify these extrasomatic aggregates. This involves calculating the Euclidean distance between mRNA points and defining the neighborhood of each point within a specified search radius. Points are then categorized as core points, border points, or noise points based on their reachability from neighboring points. mcDETECT recognizes each connected bundle of core and border points as a mRNA aggregate. To minimize false positives, it excludes aggregates that substantially overlap with somata, which are estimated by dilating the nuclear masks derived from DAPI staining. mcDETECT then repeats this process for multiple granule markers, merging aggregates from different markers that exhibit high spatial overlap. After aggregating across all markers, an additional filtering step removes aggregates containing mRNAs from negative control genes, which are known to be enriched exclusively in nuclei and somata. The remaining aggregates are considered individual RNA granules. mcDETECT then computes the minimum enclosing sphere for each aggregate to connect neighboring mRNA molecules from all measured genes and summarizes their counts, thereby defining the spatial transcriptome profile of individual RNA granules.
@@ -0,0 +1,6 @@
1
+ __version__ = "2.0.15"
2
+
3
+ from . import model
4
+ from . import utils
5
+
6
+ __all__ = ["model", "utils"]
@@ -0,0 +1,644 @@
1
+ import anndata
2
+ import math
3
+ import miniball
4
+ import numpy as np
5
+ import pandas as pd
6
+ import scanpy as sc
7
+ from collections import Counter
8
+ from rtree import index
9
+ from scipy.sparse import csr_matrix
10
+ from scipy.spatial import cKDTree
11
+ from scipy.stats import poisson
12
+ from shapely.geometry import Point
13
+ from sklearn.cluster import DBSCAN
14
+ from sklearn.preprocessing import OneHotEncoder
15
+
16
+
17
+ from .utils import *
18
+
19
+
20
+ class mcDETECT:
21
+
22
+
23
+ def __init__(self, type, transcripts, gnl_genes, nc_genes = None, eps = 1.5, minspl = None, grid_len = 1.0, cutoff_prob = 0.95, alpha = 5.0, low_bound = 3,
24
+ size_thr = 4.0, in_soma_thr = (0.5, 0.5), l = 1.0, rho = 0.2, s = 1.0, nc_top = 20, nc_thr = 0.1):
25
+
26
+ self.type = type # string, iST platform, now support MERSCOPE, Xenium, and CosMx
27
+ self.transcripts = transcripts # dataframe, transcripts file
28
+ self.gnl_genes = gnl_genes # list, string, all granule markers
29
+ self.nc_genes = nc_genes # list, string, all negative controls
30
+ self.eps = eps # numeric, searching radius epsilon
31
+ self.minspl = minspl # integer, manually select min_samples, i.e., no automatic parameter selection
32
+ self.grid_len = grid_len # numeric, length of grids for computing the tissue area
33
+ self.cutoff_prob = cutoff_prob # numeric, cutoff probability in parameter selection for min_samples
34
+ self.alpha = alpha # numeric, scaling factor in parameter selection for min_samples
35
+ self.low_bound = low_bound # integer, lower bound in parameter selection for min_samples
36
+ self.size_thr = size_thr # numeric, threshold for maximum radius of an aggregation
37
+ self.in_soma_thr = in_soma_thr # 2-d tuple, threshold for low- and high-in-soma ratio
38
+ self.l = l # numeric, scaling factor for seaching overlapped spheres
39
+ self.rho = rho # numeric, threshold for determining overlaps
40
+ self.s = s # numeric, scaling factor for merging overlapped spheres
41
+ self.nc_top = nc_top # integer, number of negative controls retained for filtering
42
+ self.nc_thr = nc_thr # numeric, threshold for negative control filtering
43
+
44
+
45
+ # [INNER] construct grids, input for tissue_area()
46
+ def construct_grid(self, grid_len = None):
47
+ if grid_len is None:
48
+ grid_len = self.grid_len
49
+ x_min, x_max = np.min(self.transcripts["global_x"]), np.max(self.transcripts["global_x"])
50
+ y_min, y_max = np.min(self.transcripts["global_y"]), np.max(self.transcripts["global_y"])
51
+ x_min = np.floor(x_min / grid_len) * grid_len
52
+ x_max = np.ceil(x_max / grid_len) * grid_len
53
+ y_min = np.floor(y_min / grid_len) * grid_len
54
+ y_max = np.ceil(y_max / grid_len) * grid_len
55
+ x_bins = np.arange(x_min, x_max + grid_len, grid_len)
56
+ y_bins = np.arange(y_min, y_max + grid_len, grid_len)
57
+ return x_bins, y_bins
58
+
59
+
60
+ # [INNER] calculate tissue area, input for poisson_select()
61
+ def tissue_area(self):
62
+ x_bins, y_bins = self.construct_grid(grid_len = None)
63
+ hist, _, _ = np.histogram2d(self.transcripts["global_x"], self.transcripts["global_y"], bins = [x_bins, y_bins])
64
+ area = np.count_nonzero(hist) * (self.grid_len ** 2)
65
+ return area
66
+
67
+
68
+ # [INNER] calculate optimal min_samples, input for dbscan()
69
+ def poisson_select(self, gene_name):
70
+ num_trans = np.sum(self.transcripts["target"] == gene_name)
71
+ bg_density = num_trans / self.tissue_area()
72
+ cutoff_density = poisson.ppf(self.cutoff_prob, mu = self.alpha * bg_density * (np.pi * self.eps ** 2))
73
+ optimal_m = int(max(cutoff_density, self.low_bound))
74
+ return optimal_m
75
+
76
+
77
+ # [INTERMEDIATE] dictionary, low- and high-in-soma spheres for each granule marker
78
+ def dbscan(self, target_names = None, record_cell_id = False, write_csv = False, write_path = "./"):
79
+
80
+ if self.type != "Xenium":
81
+ z_grid = list(self.transcripts["global_z"].unique())
82
+ z_grid.sort()
83
+
84
+ if target_names is None:
85
+ target_names = self.gnl_genes
86
+ transcripts = self.transcripts[self.transcripts["target"].isin(target_names)]
87
+
88
+ num_individual, data_low, data_high = [], {}, {}
89
+
90
+ for j in target_names:
91
+
92
+ # split transcripts
93
+ target = transcripts[transcripts["target"] == j]
94
+ others = transcripts[transcripts["target"] != j]
95
+ tree = make_tree(d1 = np.array(others["global_x"]), d2 = np.array(others["global_y"]), d3 = np.array(others["global_z"]))
96
+
97
+ # 3D DBSCAN
98
+ if self.minspl is None:
99
+ min_spl = self.poisson_select(j)
100
+ else:
101
+ min_spl = self.minspl
102
+ X = np.array(target[["global_x", "global_y", "global_z"]])
103
+ db = DBSCAN(eps = self.eps, min_samples = min_spl, algorithm = "kd_tree").fit(X)
104
+ labels = db.labels_
105
+ n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
106
+
107
+ # iterate over all aggregations
108
+ cell_id, sphere_x, sphere_y, sphere_z, layer_z, sphere_r, sphere_size, sphere_comp, sphere_score = [], [], [], [], [], [], [], [], []
109
+
110
+ for k in range(n_clusters):
111
+
112
+ # ---------- find minimum enclosing spheres ---------- #
113
+ mask = (labels == k)
114
+ coords = X[mask]
115
+ if coords.shape[0] == 0:
116
+ continue
117
+
118
+ temp = pd.DataFrame(coords, columns=["global_x", "global_y", "global_z"])
119
+ temp = temp.drop_duplicates()
120
+ coords_unique = temp.to_numpy()
121
+
122
+ # skip clusters with too few unique points
123
+ if coords_unique.shape[0] < self.low_bound:
124
+ print(f"Skipping small cluster for gene {j}, cluster {k} (n = {coords_unique.shape[0]})")
125
+ continue
126
+
127
+ # compute minimum enclosing sphere without singularity issues
128
+ try:
129
+ center, r2 = miniball.get_bounding_ball(coords_unique, epsilon=1e-8)
130
+ except np.linalg.LinAlgError:
131
+ print(f"Warning: singular matrix for gene {j}, cluster {k} —- using fallback sphere.")
132
+ center = coords_unique.mean(axis=0)
133
+ dists = np.linalg.norm(coords_unique - center, axis=1)
134
+ r2 = (dists.max() ** 2)
135
+
136
+ # record closest z-layer
137
+ if self.type != "Xenium":
138
+ closest_z = closest(z_grid, center[2])
139
+ else:
140
+ closest_z = center[2]
141
+
142
+ # record cell id after filtering
143
+ if record_cell_id:
144
+ temp_target = target[labels == k]
145
+ temp_cell_id_mode = temp_target["cell_id"].mode()[0]
146
+ cell_id.append(temp_cell_id_mode)
147
+
148
+ # ---------- compute sphere features (size, composition, and in-soma ratio) ---------- #
149
+ temp_in_soma = np.sum(target["overlaps_nucleus"].values[mask])
150
+ temp_size = coords.shape[0]
151
+ other_idx = tree.query_ball_point([center[0], center[1], center[2]], np.sqrt(r2))
152
+ other_trans = others.iloc[other_idx]
153
+ other_in_soma = np.sum(other_trans["overlaps_nucleus"])
154
+ other_size = other_trans.shape[0]
155
+ other_comp = len(other_trans["target"].unique())
156
+ total_size = temp_size + other_size
157
+ total_comp = 1 + other_comp
158
+ in_soma_score = (temp_in_soma + other_in_soma) / total_size
159
+
160
+ # record sphere features
161
+ sphere_x.append(center[0])
162
+ sphere_y.append(center[1])
163
+ sphere_z.append(center[2])
164
+ layer_z.append(closest_z)
165
+ sphere_r.append(np.sqrt(r2))
166
+ sphere_size.append(total_size)
167
+ sphere_comp.append(total_comp)
168
+ sphere_score.append(in_soma_score)
169
+
170
+ # basic features for all spheres from each granule marker
171
+ sphere = pd.DataFrame(list(zip(sphere_x, sphere_y, sphere_z, layer_z, sphere_r, sphere_size, sphere_comp, sphere_score, [j] * len(sphere_x))),
172
+ columns = ["sphere_x", "sphere_y", "sphere_z", "layer_z", "sphere_r", "size", "comp", "in_soma_ratio", "gene"])
173
+ sphere = sphere.astype({"sphere_x": float, "sphere_y": float, "sphere_z": float, "layer_z": float, "sphere_r": float, "size": float, "comp": float, "in_soma_ratio": float, "gene": str})
174
+ if record_cell_id:
175
+ sphere["cell_id"] = cell_id
176
+ sphere = sphere.astype({"cell_id": str})
177
+
178
+ # split low- and high-in-soma spheres
179
+ sphere_low = sphere[(sphere["sphere_r"] < self.size_thr) & (sphere["in_soma_ratio"] < self.in_soma_thr[0])]
180
+ sphere_high = sphere[(sphere["sphere_r"] < self.size_thr) & (sphere["in_soma_ratio"] > self.in_soma_thr[1])]
181
+
182
+ if write_csv:
183
+ sphere_low.to_csv(write_path + j + " sphere.csv", index=0)
184
+ sphere_high.to_csv(write_path + j + " sphere_high.csv", index=0)
185
+
186
+ num_individual.append(sphere_low.shape[0])
187
+ data_low[target_names.index(j)] = sphere_low
188
+ data_high[target_names.index(j)] = sphere_high
189
+ print("{} out of {} genes processed!".format(target_names.index(j) + 1, len(target_names)))
190
+
191
+ return np.sum(num_individual), data_low, data_high
192
+
193
+
194
+ # [INNER] merge points from two overlapped spheres, input for remove_overlaps()
195
+ def find_points(self, sphere_a, sphere_b):
196
+ transcripts = self.transcripts[self.transcripts["target"].isin(self.gnl_genes)]
197
+ tree_temp = make_tree(d1 = np.array(transcripts["global_x"]), d2 = np.array(transcripts["global_y"]), d3 = np.array(transcripts["global_z"]))
198
+ idx_a = tree_temp.query_ball_point([sphere_a["sphere_x"], sphere_a["sphere_y"], sphere_a["sphere_z"]], sphere_a["sphere_r"])
199
+ points_a = transcripts.iloc[idx_a]
200
+ points_a = points_a[points_a["target"] == sphere_a["gene"]]
201
+ idx_b = tree_temp.query_ball_point([sphere_b["sphere_x"], sphere_b["sphere_y"], sphere_b["sphere_z"]], sphere_b["sphere_r"])
202
+ points_b = transcripts.iloc[idx_b]
203
+ points_b = points_b[points_b["target"] == sphere_b["gene"]]
204
+ points = pd.concat([points_a, points_b])
205
+ points = points[["global_x", "global_y", "global_z"]]
206
+ return points
207
+
208
+
209
+ def remove_overlaps(self, set_a, set_b):
210
+
211
+ set_a = set_a.copy()
212
+ set_b = set_b.copy()
213
+
214
+ # find possible overlaps on 2D by r-tree
215
+ idx_b = make_rtree(set_b)
216
+ for i, sphere_a in set_a.iterrows():
217
+ center_a_3D = np.array([sphere_a.sphere_x, sphere_a.sphere_y, sphere_a.sphere_z])
218
+ bounds_a = (sphere_a.sphere_x - sphere_a.sphere_r,
219
+ sphere_a.sphere_y - sphere_a.sphere_r,
220
+ sphere_a.sphere_x + sphere_a.sphere_r,
221
+ sphere_a.sphere_y + sphere_a.sphere_r)
222
+ possible_overlaps = idx_b.intersection(bounds_a)
223
+
224
+ # search 3D overlaps within possible overlaps
225
+ for j in possible_overlaps:
226
+ if j in set_b.index:
227
+ sphere_b = set_b.loc[j]
228
+ center_b_3D = np.array([sphere_b.sphere_x, sphere_b.sphere_y, sphere_b.sphere_z])
229
+ dist = np.linalg.norm(center_a_3D - center_b_3D)
230
+ radius_sum = sphere_a.sphere_r + sphere_b.sphere_r
231
+ radius_diff = sphere_a.sphere_r - sphere_b.sphere_r
232
+
233
+ # relative positions (0: internal & intersect, 1: internal, 2: intersect)
234
+ c0 = (dist < self.l * radius_sum)
235
+ c1 = (dist <= self.l * np.abs(radius_diff))
236
+ c1_1 = (radius_diff > 0)
237
+ c2_1 = (dist < self.rho * self.l * radius_sum)
238
+
239
+ # operations on dataframes
240
+ if c0:
241
+ if c1 and c1_1: # keep A and remove B
242
+ set_b.drop(index = j, inplace = True)
243
+ elif c1 and not c1_1: # replace A with B and remove B
244
+ set_a.loc[i] = set_b.loc[j]
245
+ set_b.drop(index = j, inplace = True)
246
+ elif not c1 and c2_1: # replace A with new sphere and remove B
247
+ points_union = np.array(self.find_points(sphere_a, sphere_b))
248
+ new_center, new_radius = miniball.get_bounding_ball(points_union, epsilon=1e-8)
249
+ set_a.loc[i, "sphere_x"] = new_center[0]
250
+ set_a.loc[i, "sphere_y"] = new_center[1]
251
+ set_a.loc[i, "sphere_z"] = new_center[2]
252
+ set_a.loc[i, "sphere_r"] = self.s * new_radius
253
+ set_b.drop(index = j, inplace = True)
254
+
255
+ set_a = set_a.reset_index(drop = True)
256
+ set_b = set_b.reset_index(drop = True)
257
+ return set_a, set_b
258
+
259
+
260
+ # [INNER] merge spheres from different granule markers, input for detect()
261
+ def merge_sphere(self, sphere_dict):
262
+ sphere = sphere_dict[0].copy()
263
+ for j in range(1, len(self.gnl_genes)):
264
+ target_sphere = sphere_dict[j]
265
+ sphere, target_sphere_new = self.remove_overlaps(sphere, target_sphere)
266
+ sphere = pd.concat([sphere, target_sphere_new])
267
+ sphere = sphere.reset_index(drop = True)
268
+ return sphere
269
+
270
+
271
+ # [INNER] negative control filtering, input for detect()
272
+ def nc_filter(self, sphere_low, sphere_high):
273
+
274
+ # negative control gene profiling
275
+ adata_low = self.profile(sphere_low, self.nc_genes)
276
+ adata_high = self.profile(sphere_high, self.nc_genes)
277
+ adata = anndata.concat([adata_low, adata_high], axis = 0, merge = "same")
278
+ adata.var["genes"] = adata.var.index
279
+ adata.obs_keys = list(np.arange(adata.shape[0]))
280
+ adata.obs["type"] = ["low"] * adata_low.shape[0] + ["high"] * adata_high.shape[0]
281
+ adata.obs["type"] = pd.Categorical(adata.obs["type"], categories = ["low", "high"], ordered = True)
282
+
283
+ # DE analysis of negative control genes
284
+ sc.tl.rank_genes_groups(adata, "type", method = "t-test")
285
+ names = adata.uns["rank_genes_groups"]["names"]
286
+ names = pd.DataFrame(names)
287
+ logfc = adata.uns["rank_genes_groups"]["logfoldchanges"]
288
+ logfc = pd.DataFrame(logfc)
289
+ pvals = adata.uns["rank_genes_groups"]["pvals"]
290
+ pvals = pd.DataFrame(pvals)
291
+
292
+ # select top upregulated negative control genes
293
+ df = pd.DataFrame({"names": names["high"], "logfc": logfc["high"], "pvals": pvals["high"]})
294
+ df = df[df["logfc"] >= 0]
295
+ df = df.sort_values(by = ["pvals"], ascending = True)
296
+ nc_genes_final = list(df["names"].head(self.nc_top))
297
+
298
+ # negative control filtering
299
+ nc_transcripts_final = self.transcripts[self.transcripts["target"].isin(nc_genes_final)]
300
+ tree = make_tree(d1 = np.array(nc_transcripts_final["global_x"]), d2 = np.array(nc_transcripts_final["global_y"]), d3 = np.array(nc_transcripts_final["global_z"]))
301
+ centers = sphere_low[["sphere_x", "sphere_y", "sphere_z"]].to_numpy()
302
+ radii = sphere_low["sphere_r"].to_numpy()
303
+ sizes = sphere_low["size"].to_numpy()
304
+ counts = np.array([len(tree.query_ball_point(c, r)) for c, r in zip(centers, radii)])
305
+ nc_ratio = counts / sizes
306
+ sphere = sphere_low.copy().reset_index(drop=True)
307
+ sphere["nc_ratio"] = nc_ratio
308
+ if self.nc_thr is None:
309
+ return sphere
310
+ pass_idx = (counts == 0) | (nc_ratio < self.nc_thr)
311
+ return sphere.loc[pass_idx].reset_index(drop=True)
312
+
313
+
314
+ # [MAIN] dataframe, granule metadata
315
+ def detect(self, record_cell_id = False):
316
+
317
+ _, data_low, data_high = self.dbscan(record_cell_id = record_cell_id)
318
+
319
+ print("Merging spheres...")
320
+ sphere_low, sphere_high = self.merge_sphere(data_low), self.merge_sphere(data_high)
321
+
322
+ if self.nc_genes is None:
323
+ return sphere_low
324
+ else:
325
+ print("Negative control filtering...")
326
+ return self.nc_filter(sphere_low, sphere_high)
327
+
328
+
329
+ # [MAIN] anndata, granule spatial transcriptome profile
330
+ def profile(self, granule, genes = None, buffer = 0.0, print_itr = False):
331
+
332
+ if genes is None:
333
+ genes = list(self.transcripts["target"].unique())
334
+ transcripts = self.transcripts
335
+ else:
336
+ transcripts = self.transcripts[self.transcripts["target"].isin(genes)]
337
+
338
+ gene_to_idx = {g: i for i, g in enumerate(genes)}
339
+ gene_array = transcripts["target"].to_numpy()
340
+ tree = make_tree(d1 = np.array(transcripts["global_x"]), d2 = np.array(transcripts["global_y"]), d3 = np.array(transcripts["global_z"]))
341
+
342
+ n_gnl = granule.shape[0]
343
+ n_gene = len(genes)
344
+ data, row_idx, col_idx = [], [], []
345
+
346
+ # iterate over all granules to count nearby transcripts
347
+ for i in range(n_gnl):
348
+ temp = granule.iloc[i]
349
+ target_idx = tree.query_ball_point([temp["sphere_x"], temp["sphere_y"], temp["layer_z"]], temp["sphere_r"] + buffer)
350
+ if not target_idx:
351
+ continue
352
+ local_genes = gene_array[target_idx] # extract genes for those nearby transcripts
353
+ counts = Counter(local_genes) # count how many times each gene occurs
354
+ for g, cnt in counts.items(): # append nonzero entries to sparse matrix lists
355
+ j = gene_to_idx[g] # get gene column index
356
+ data.append(cnt) # nonzero count
357
+ row_idx.append(i) # row index = granule index
358
+ col_idx.append(j) # column index = gene index
359
+ if print_itr and (i % 5000 == 0):
360
+ print(f"{i} out of {n_gnl} granules profiled!")
361
+
362
+ # construct sparse spatial transcriptome profile, (n_granules × n_genes)
363
+ X = csr_matrix((data, (row_idx, col_idx)), shape = (n_gnl, n_gene), dtype = np.float32)
364
+ adata = anndata.AnnData(X = X, obs = granule.copy())
365
+ adata.obs["granule_id"] = [f"gnl_{i}" for i in range(n_gnl)]
366
+ adata.obs = adata.obs.astype({"granule_id": str})
367
+ adata.obs.rename(columns = {"sphere_x": "global_x", "sphere_y": "global_y", "sphere_z": "global_z"}, inplace = True)
368
+ adata.var["genes"] = genes
369
+ adata.var_names = genes
370
+ adata.var_keys = genes
371
+ return adata
372
+
373
+
374
+ # [MAIN] anndata, spot-level gene expression
375
+ def spot_expression(self, grid_len, genes = None):
376
+
377
+ if genes is None:
378
+ genes = list(self.transcripts["target"].unique())
379
+ transcripts = self.transcripts
380
+ else:
381
+ transcripts = self.transcripts[self.transcripts["target"].isin(genes)]
382
+
383
+ # construct bins
384
+ x_bins, y_bins = self.construct_grid(grid_len = grid_len)
385
+
386
+ # initialize data
387
+ X = np.zeros((len(genes), (len(x_bins) - 1) * (len(y_bins) - 1)))
388
+ global_x, global_y = [], []
389
+
390
+ # coordinates
391
+ for i in list(x_bins)[:-1]:
392
+ center_x = i + 0.5 * grid_len
393
+ for j in list(y_bins)[:-1]:
394
+ center_y = j + 0.5 * grid_len
395
+ global_x.append(center_x)
396
+ global_y.append(center_y)
397
+
398
+ # count matrix
399
+ for k_idx, k in enumerate(genes):
400
+ target_gene = transcripts[transcripts["target"] == k]
401
+ count_gene, _, _ = np.histogram2d(target_gene["global_x"], target_gene["global_y"], bins = [x_bins, y_bins])
402
+ X[k_idx, :] = count_gene.flatten()
403
+ if k_idx % 100 == 0:
404
+ print("{} out of {} genes profiled!".format(k_idx, len(genes)))
405
+
406
+ # spot id
407
+ spot_id = []
408
+ for i in range(len(global_x)):
409
+ id = "spot_" + str(i)
410
+ spot_id.append(id)
411
+
412
+ # assemble data
413
+ adata = anndata.AnnData(X = np.transpose(X))
414
+ adata.obs["spot_id"] = spot_id
415
+ adata.obs["global_x"] = global_x
416
+ adata.obs["global_y"] = global_y
417
+ adata.var["genes"] = genes
418
+ adata.var_names = genes
419
+ adata.var_keys = genes
420
+ return adata
421
+
422
+
423
+ # [MAIN] anndata, spot-level neuron metadata
424
+ def spot_neuron(adata_neuron, spot, grid_len = 50, neuron_loc_key = ["global_x", "global_y"], spot_loc_key = ["global_x", "global_y"]):
425
+
426
+ adata_neuron = adata_neuron.copy()
427
+ neurons = adata_neuron.obs
428
+ spot = spot.copy()
429
+
430
+ half_len = grid_len / 2
431
+
432
+ indicator, neuron_count = [], []
433
+
434
+ for _, row in spot.obs.iterrows():
435
+
436
+ x = row[spot_loc_key[0]]
437
+ y = row[spot_loc_key[1]]
438
+ neuron_temp = neurons[(neurons[neuron_loc_key[0]] > x - half_len) & (neurons[neuron_loc_key[0]] < x + half_len) & (neurons[neuron_loc_key[1]] > y - half_len) & (neurons[neuron_loc_key[1]] < y + half_len)]
439
+ indicator.append(int(len(neuron_temp) > 0))
440
+ neuron_count.append(len(neuron_temp))
441
+
442
+ spot.obs["indicator"] = indicator
443
+ spot.obs["neuron_count"] = neuron_count
444
+ return spot
445
+
446
+
447
+ # [MAIN] anndata, spot-level granule metadata
448
+ def spot_granule(granule, spot, grid_len = 50, gnl_loc_key = ["sphere_x", "sphere_y"], spot_loc_key = ["global_x", "global_y"]):
449
+
450
+ granule = granule.copy()
451
+ spot = spot.copy()
452
+
453
+ half_len = grid_len / 2
454
+
455
+ indicator, granule_count, granule_radius, granule_size, granule_score = [], [], [], [], []
456
+
457
+ for _, row in spot.obs.iterrows():
458
+
459
+ x = row[spot_loc_key[0]]
460
+ y = row[spot_loc_key[1]]
461
+ gnl_temp = granule[(granule[gnl_loc_key[0]] >= x - half_len) & (granule[gnl_loc_key[0]] < x + half_len) & (granule[gnl_loc_key[1]] >= y - half_len) & (granule[gnl_loc_key[1]] < y + half_len)]
462
+ indicator.append(int(len(gnl_temp) > 0))
463
+ granule_count.append(len(gnl_temp))
464
+
465
+ if len(gnl_temp) == 0:
466
+ granule_radius.append(0)
467
+ granule_size.append(0)
468
+ granule_score.append(0)
469
+ else:
470
+ granule_radius.append(np.nanmean(gnl_temp["sphere_r"]))
471
+ granule_size.append(np.nanmean(gnl_temp["size"]))
472
+ granule_score.append(np.nanmean(gnl_temp["in_soma_ratio"]))
473
+
474
+ spot.obs["indicator"] = indicator
475
+ spot.obs["gnl_count"] = granule_count
476
+ spot.obs["gnl_radius"] = granule_radius
477
+ spot.obs["gnl_size"] = granule_size
478
+ spot.obs["gnl_score"] = granule_score
479
+ return spot
480
+
481
+
482
+ # [Main] anndata, neuron-granule colocalization
483
+ def neighbor_granule(adata_neuron, granule_adata, radius = 10, sigma = None, loc_key = ["global_x", "global_y"]):
484
+
485
+ adata_neuron = adata_neuron.copy()
486
+ granule_adata = granule_adata.copy()
487
+
488
+ if sigma is None:
489
+ sigma = radius / 2
490
+
491
+ # neuron and granule coordinates
492
+ neuron_coords = adata_neuron.obs[loc_key].values
493
+ gnl_coords = granule_adata.obs[loc_key].values
494
+
495
+ # make tree
496
+ tree = make_tree(d1 = gnl_coords[:, 0], d2 = gnl_coords[:, 1])
497
+
498
+ # query neighboring granules for each neuron
499
+ neighbor_indices = tree.query_ball_point(neuron_coords, r = radius)
500
+
501
+ # record count and indices
502
+ granule_counts = np.array([len(indices) for indices in neighbor_indices])
503
+ adata_neuron.obs["neighbor_gnl_count"] = granule_counts
504
+ adata_neuron.uns["neighbor_gnl_indices"] = neighbor_indices
505
+
506
+ # ---------- neighboring granule expression matrix ---------- #
507
+ n_neurons, n_genes = adata_neuron.n_obs, adata_neuron.n_vars
508
+ weighted_expr = np.zeros((n_neurons, n_genes))
509
+
510
+ for i, indices in enumerate(neighbor_indices):
511
+ if len(indices) == 0:
512
+ continue
513
+ distances = np.linalg.norm(gnl_coords[indices] - neuron_coords[i], axis = 1)
514
+ weights = np.exp(- (distances ** 2) / (2 * sigma ** 2))
515
+ weights = weights / weights.sum()
516
+ weighted_expr[i] = np.average(granule_adata.X[indices], axis = 0, weights = weights)
517
+
518
+ adata_neuron.obsm["weighted_gnl_expression"] = weighted_expr
519
+
520
+ # ---------- neighboring granule spatial feature ---------- #
521
+ features = []
522
+
523
+ for i, gnl_idx in enumerate(neighbor_indices):
524
+
525
+ feats = {}
526
+ feats["n_granules"] = len(gnl_idx)
527
+
528
+ if len(gnl_idx) == 0:
529
+ feats.update({"mean_distance": np.nan, "std_distance": np.nan, "radius_max": np.nan, "radius_min": np.nan, "density": 0, "center_offset_norm": np.nan, "anisotropy_ratio": np.nan})
530
+ else:
531
+ gnl_pos = gnl_coords[gnl_idx]
532
+ neuron_pos = neuron_coords[i]
533
+ dists = np.linalg.norm(gnl_pos - neuron_pos, axis = 1)
534
+ feats["mean_distance"] = dists.mean()
535
+ feats["std_distance"] = dists.std()
536
+ feats["radius_max"] = dists.max()
537
+ feats["radius_min"] = dists.min()
538
+ feats["density"] = len(gnl_idx) / (np.pi * radius ** 2)
539
+ centroid = gnl_pos.mean(axis = 0)
540
+ offset = centroid - neuron_pos
541
+ feats["center_offset_norm"] = np.linalg.norm(offset)
542
+ cov = np.cov((gnl_pos - neuron_pos).T)
543
+ eigvals = np.linalg.eigvalsh(cov)
544
+ if np.min(eigvals) > 0:
545
+ feats["anisotropy_ratio"] = np.max(eigvals) / np.min(eigvals)
546
+ else:
547
+ feats["anisotropy_ratio"] = np.nan
548
+
549
+ features.append(feats)
550
+
551
+ spatial_df = pd.DataFrame(features, index = adata_neuron.obs_names)
552
+ return adata_neuron, spatial_df
553
+
554
+
555
+ # [MAIN] numpy array, neuron embeddings based on neighboring granules
556
+ def neuron_embedding_one_hot(adata_neuron, granule_adata, k = 10, radius = 10, loc_key = ["global_x", "global_y"], gnl_subtype_key = "granule_subtype_kmeans", padding_value = "Others"):
557
+
558
+ adata_neuron = adata_neuron.copy()
559
+ granule_adata = granule_adata.copy()
560
+
561
+ # neuron and granule coordinates, granule subtypes
562
+ neuron_coords = adata_neuron.obs[loc_key].to_numpy()
563
+ granule_coords = granule_adata.obs[loc_key].to_numpy()
564
+ granule_subtypes = granule_adata.obs[gnl_subtype_key].astype(str).to_numpy()
565
+
566
+ # include padding category
567
+ unique_subtypes = np.unique(granule_subtypes).tolist()
568
+ if padding_value not in unique_subtypes:
569
+ unique_subtypes.append(padding_value)
570
+
571
+ encoder = OneHotEncoder(categories = [unique_subtypes], sparse = False, handle_unknown = "ignore")
572
+ encoder.fit(np.array(unique_subtypes).reshape(-1, 1))
573
+ S = len(unique_subtypes)
574
+
575
+ # k-d tree
576
+ tree = make_tree(d1 = granule_coords[:, 0], d2 = granule_coords[:, 1])
577
+ distances, indices = tree.query(neuron_coords, k = k, distance_upper_bound = radius)
578
+
579
+ # initialize output
580
+ n_neurons = neuron_coords.shape[0]
581
+ embeddings = np.zeros((n_neurons, k, S), dtype = float)
582
+
583
+ for i in range(n_neurons):
584
+ for k in range(k):
585
+ idx = indices[i, k]
586
+ dist = distances[i, k]
587
+ if idx == granule_coords.shape[0] or np.isinf(dist):
588
+ subtype = padding_value
589
+ else:
590
+ subtype = granule_subtypes[idx]
591
+ onehot = encoder.transform([[subtype]])[0]
592
+ embeddings[i, k, :] = onehot
593
+
594
+ return embeddings, encoder.categories_[0]
595
+
596
+
597
+ # [MAIN] numpy array, neuron embeddings based on neighboring granules
598
+ def neuron_embedding_spatial_weight(adata_neuron, granule_adata, radius = 10, sigma = 10, loc_key = ["global_x", "global_y"], gnl_subtype_key = "granule_subtype_kmeans", padding_value = "Others"):
599
+
600
+ adata_neuron = adata_neuron.copy()
601
+ granule_adata = granule_adata.copy()
602
+
603
+ # neuron and granule coordinates, granule subtypes
604
+ neuron_coords = adata_neuron.obs[loc_key].to_numpy()
605
+ granule_coords = granule_adata.obs[loc_key].to_numpy()
606
+ granule_subtypes = granule_adata.obs[gnl_subtype_key].astype(str).to_numpy()
607
+
608
+ # include padding category
609
+ unique_subtypes = np.unique(granule_subtypes).tolist()
610
+ if padding_value not in unique_subtypes:
611
+ unique_subtypes.append(padding_value)
612
+
613
+ encoder = OneHotEncoder(categories = [unique_subtypes], sparse = False, handle_unknown = "ignore")
614
+ encoder.fit(np.array(unique_subtypes).reshape(-1, 1))
615
+ S = len(unique_subtypes)
616
+
617
+ # k-d tree
618
+ tree = make_tree(d1 = granule_coords[:, 0], d2 = granule_coords[:, 1])
619
+ all_neighbors = tree.query_ball_point(neuron_coords, r = radius)
620
+
621
+ # initialize output
622
+ n_neurons = neuron_coords.shape[0]
623
+ embeddings = np.zeros((n_neurons, S), dtype = float)
624
+
625
+ for i, neighbor_indices in enumerate(all_neighbors):
626
+ if not neighbor_indices:
627
+ # no neighbors, assign to padding subtype
628
+ embeddings[i] = encoder.transform([[padding_value]])[0]
629
+ continue
630
+
631
+ # get neighbor subtypes and distances
632
+ neighbor_coords = granule_coords[neighbor_indices]
633
+ dists = np.linalg.norm(neuron_coords[i] - neighbor_coords, axis = 1)
634
+ weights = np.exp(- dists / sigma)
635
+
636
+ # encode subtypes to one-hot and weight them
637
+ subtypes = granule_subtypes[neighbor_indices]
638
+ onehots = encoder.transform(subtypes.reshape(-1, 1))
639
+ weighted_sum = (weights[:, np.newaxis] * onehots).sum(axis = 0)
640
+
641
+ # normalize to make it a composition vector
642
+ embeddings[i] = weighted_sum / weights.sum()
643
+
644
+ return embeddings, encoder.categories_[0]
@@ -0,0 +1,145 @@
1
+ import matplotlib.pyplot as plt
2
+ import numpy as np
3
+ import pandas as pd
4
+ import seaborn as sns
5
+ from matplotlib import colors as mcolors
6
+ from rtree import index
7
+ from scipy.spatial import cKDTree
8
+ from scipy.stats import rankdata
9
+ from shapely.geometry import Point
10
+
11
+
12
+ def find_threshold_index(cumsum_list, threshold = 0.99):
13
+ total = cumsum_list[-1]
14
+ for i, value in enumerate(cumsum_list):
15
+ if value >= threshold * total:
16
+ return i
17
+ return None
18
+
19
+
20
+ def closest(lst, K):
21
+ return lst[min(range(len(lst)), key = lambda i: abs(lst[i] - K))]
22
+
23
+
24
+ def make_tree(d1 = None, d2 = None, d3 = None):
25
+ active_dimensions = [dimension for dimension in [d1, d2, d3] if dimension is not None]
26
+ if len(active_dimensions) == 1:
27
+ points = np.c_[active_dimensions[0].ravel()]
28
+ elif len(active_dimensions) == 2:
29
+ points = np.c_[active_dimensions[0].ravel(), active_dimensions[1].ravel()]
30
+ elif len(active_dimensions) == 3:
31
+ points = np.c_[active_dimensions[0].ravel(), active_dimensions[1].ravel(), active_dimensions[2].ravel()]
32
+ return cKDTree(points)
33
+
34
+
35
+ def make_rtree(spheres):
36
+ p = index.Property()
37
+ idx = index.Index(properties = p)
38
+ for i, sphere in enumerate(spheres.itertuples()):
39
+ center = Point(sphere.sphere_x, sphere.sphere_y)
40
+ bounds = (center.x - sphere.sphere_r,
41
+ center.y - sphere.sphere_r,
42
+ center.x + sphere.sphere_r,
43
+ center.y + sphere.sphere_r)
44
+ idx.insert(i, bounds)
45
+ return idx
46
+
47
+
48
+ def scale(array, max = 1):
49
+ new_array = (array - np.min(array)) / (np.max(array) - np.min(array)) * max
50
+ return new_array
51
+
52
+
53
+ def weighted_corr(estimated, actual, weights):
54
+
55
+ estimated = np.array(estimated)
56
+ actual = np.array(actual)
57
+ weights = np.array(weights)
58
+
59
+ # weighted mean
60
+ mean_estimated = np.average(estimated, weights = weights)
61
+ mean_actual = np.average(actual, weights = weights)
62
+
63
+ # weighted covariance
64
+ cov_w = np.sum(weights * (estimated - mean_estimated) * (actual - mean_actual)) / np.sum(weights)
65
+
66
+ # weighted variances
67
+ var_estimated = np.sum(weights * (estimated - mean_estimated) ** 2) / np.sum(weights)
68
+ var_actual = np.sum(weights * (actual - mean_actual) ** 2) / np.sum(weights)
69
+
70
+ # weighted correlation coefficient
71
+ weighted_corr = cov_w / np.sqrt(var_estimated * var_actual)
72
+
73
+ return weighted_corr
74
+
75
+
76
+ def weighted_spearmanr(A, B, weights):
77
+
78
+ A = np.array(A)
79
+ B = np.array(B)
80
+ weights = np.array(weights)
81
+
82
+ # rank the data
83
+ R_A = rankdata(A)
84
+ R_B = rankdata(B)
85
+
86
+ # weighted mean
87
+ mean_R_A_w = np.average(R_A, weights=weights)
88
+ mean_R_B_w = np.average(R_B, weights=weights)
89
+
90
+ # weighted covariance
91
+ cov_w = np.sum(weights * (R_A - mean_R_A_w) * (R_B - mean_R_B_w)) / np.sum(weights)
92
+
93
+ # weighted variances
94
+ var_R_A_w = np.sum(weights * (R_A - mean_R_A_w)**2) / np.sum(weights)
95
+ var_R_B_w = np.sum(weights * (R_B - mean_R_B_w)**2) / np.sum(weights)
96
+
97
+ # weighted Spearman correlation coefficient
98
+ weighted_spearman_corr = cov_w / np.sqrt(var_R_A_w * var_R_B_w)
99
+
100
+ return weighted_spearman_corr
101
+
102
+
103
+ def assign_palette_to_adata(adata, obs_key = "granule_expr_cluster_hierarchical", self_defined = False, cmap_name = "tab10"):
104
+
105
+ adata = adata.copy()
106
+
107
+ if not pd.api.types.is_categorical_dtype(adata.obs[obs_key]):
108
+ adata.obs[obs_key] = adata.obs[obs_key].astype("category")
109
+
110
+ categories = adata.obs[obs_key].cat.categories
111
+ n_categories = len(categories)
112
+
113
+ if self_defined:
114
+ cmap = plt.colormaps[cmap_name]
115
+ color_palette = [cmap(i) for i in range(n_categories)]
116
+ else:
117
+ base_colors = plt.get_cmap(cmap_name).colors
118
+ if n_categories > len(base_colors):
119
+ color_palette = sns.color_palette(cmap_name, n_categories)
120
+ else:
121
+ color_palette = base_colors[:n_categories]
122
+
123
+ adata.uns[f"{obs_key}_colors"] = [mcolors.to_hex(c) for c in color_palette]
124
+
125
+ return adata
126
+
127
+
128
+ def p_val_to_star(p):
129
+ if p > 0.05:
130
+ return "ns"
131
+ elif p > 0.01:
132
+ return "*"
133
+ elif p > 0.001:
134
+ return "**"
135
+ else:
136
+ return "***"
137
+
138
+
139
+ def top_columns_above_threshold(row, threshold=0.5):
140
+ sorted_row = row.sort_values(ascending=False)
141
+ cumsum = sorted_row.cumsum()
142
+ # Find how many top columns are needed to exceed the threshold
143
+ n = (cumsum > threshold).idxmax()
144
+ # Slice up to and including the index that crosses the threshold
145
+ return sorted_row.loc[:n].index.tolist()
@@ -0,0 +1,40 @@
1
+ Metadata-Version: 2.4
2
+ Name: mcDETECT
3
+ Version: 2.0.15
4
+ Summary: Uncovering the dark transcriptome in polarized neuronal compartments with mcDETECT
5
+ Home-page: https://github.com/chen-yang-yuan/mcDETECT
6
+ Author: Chenyang Yuan
7
+ Author-email: chenyang.yuan@emory.edu
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.6
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: anndata
15
+ Requires-Dist: miniball
16
+ Requires-Dist: numpy
17
+ Requires-Dist: pandas
18
+ Requires-Dist: rtree
19
+ Requires-Dist: scanpy
20
+ Requires-Dist: scikit-learn
21
+ Requires-Dist: scipy
22
+ Requires-Dist: shapely
23
+ Dynamic: author
24
+ Dynamic: author-email
25
+ Dynamic: classifier
26
+ Dynamic: description
27
+ Dynamic: description-content-type
28
+ Dynamic: home-page
29
+ Dynamic: license-file
30
+ Dynamic: requires-dist
31
+ Dynamic: requires-python
32
+ Dynamic: summary
33
+
34
+ # mcDETECT
35
+
36
+ ## Uncovering the dark transcriptome in polarized neuronal compartments with mcDETECT
37
+
38
+ #### Chenyang Yuan, Krupa Patel, Hongshun Shi, Hsiao-Lin V. Wang, Feng Wang, Ronghua Li, Yangping Li, Victor G. Corces, Hailing Shi, Sulagna Das, Jindan Yu, Peng Jin, Bing Yao* and Jian Hu*
39
+
40
+ mcDETECT is a computational framework designed to study the dark transcriptome related to polarized compartments in brain using *in situ* spatial transcriptomics (iST) data. It begins by examining the subcellular distribution of mRNAs in an iST sample. Each mRNA molecule is treated as a distinct point with its own 3D spatial coordinates considering the thickness of the sample. Unlike many cell-type marker genes, which are typically found within the nucleus or soma, compartmentalized mRNAs often form small aggregates outside the soma. mcDETECT uses a density-based clustering approach to identify these extrasomatic aggregates. This involves calculating the Euclidean distance between mRNA points and defining the neighborhood of each point within a specified search radius. Points are then categorized as core points, border points, or noise points based on their reachability from neighboring points. mcDETECT recognizes each connected bundle of core and border points as a mRNA aggregate. To minimize false positives, it excludes aggregates that substantially overlap with somata, which are estimated by dilating the nuclear masks derived from DAPI staining. mcDETECT then repeats this process for multiple granule markers, merging aggregates from different markers that exhibit high spatial overlap. After aggregating across all markers, an additional filtering step removes aggregates containing mRNAs from negative control genes, which are known to be enriched exclusively in nuclei and somata. The remaining aggregates are considered individual RNA granules. mcDETECT then computes the minimum enclosing sphere for each aggregate to connect neighboring mRNA molecules from all measured genes and summarizes their counts, thereby defining the spatial transcriptome profile of individual RNA granules.
@@ -0,0 +1,11 @@
1
+ LICENSE
2
+ README.md
3
+ setup.py
4
+ mcDETECT/__init__.py
5
+ mcDETECT/model.py
6
+ mcDETECT/utils.py
7
+ mcDETECT.egg-info/PKG-INFO
8
+ mcDETECT.egg-info/SOURCES.txt
9
+ mcDETECT.egg-info/dependency_links.txt
10
+ mcDETECT.egg-info/requires.txt
11
+ mcDETECT.egg-info/top_level.txt
@@ -0,0 +1,9 @@
1
+ anndata
2
+ miniball
3
+ numpy
4
+ pandas
5
+ rtree
6
+ scanpy
7
+ scikit-learn
8
+ scipy
9
+ shapely
@@ -0,0 +1 @@
1
+ mcDETECT
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,20 @@
1
+ from setuptools import setup, find_packages
2
+
3
+ setup(
4
+ name = "mcDETECT",
5
+ version = "2.0.15",
6
+ packages = find_packages(),
7
+ install_requires = ["anndata", "miniball", "numpy", "pandas", "rtree", "scanpy", "scikit-learn", "scipy", "shapely"],
8
+ author = "Chenyang Yuan",
9
+ author_email = "chenyang.yuan@emory.edu",
10
+ description = "Uncovering the dark transcriptome in polarized neuronal compartments with mcDETECT",
11
+ long_description = open("README.md").read(),
12
+ long_description_content_type = "text/markdown",
13
+ url = "https://github.com/chen-yang-yuan/mcDETECT",
14
+ classifiers = [
15
+ "Programming Language :: Python :: 3",
16
+ "License :: OSI Approved :: MIT License",
17
+ "Operating System :: OS Independent",
18
+ ],
19
+ python_requires = ">=3.6",
20
+ )