jdti 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
jdti-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,610 @@
1
+ Metadata-Version: 2.4
2
+ Name: jdti
3
+ Version: 0.1.0
4
+ Summary:
5
+ Author: jkubis96
6
+ Author-email: jbiosystem@gmail.com
7
+ Requires-Python: >=3.12,<3.13
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.12
10
+ Requires-Dist: adjusttext (>=1.3.0,<2.0.0)
11
+ Requires-Dist: harmonypy (==0.0.10)
12
+ Requires-Dist: joblib (>=1.5.2,<2.0.0)
13
+ Requires-Dist: matplotlib (>=3.10.6,<4.0.0)
14
+ Requires-Dist: numpy (>=2.3.3,<3.0.0)
15
+ Requires-Dist: pandas (>=2.3.2,<3.0.0)
16
+ Requires-Dist: plotly (>=6.3.0,<7.0.0)
17
+ Requires-Dist: pytest (>=8.4.2,<9.0.0)
18
+ Requires-Dist: scikit-learn (>=1.7.2,<2.0.0)
19
+ Requires-Dist: scipy (>=1.16.2,<2.0.0)
20
+ Requires-Dist: seaborn (>=0.13.2,<0.14.0)
21
+ Requires-Dist: tqdm (>=4.67.1,<5.0.0)
22
+ Requires-Dist: umap-learn (>=0.5.9.post2,<0.6.0)
23
+ Description-Content-Type: text/markdown
24
+
25
+ ## JDtI – Python library for scRNAseq/RNAseq data analysis
26
+
27
+
28
+
29
+ ![Python version](https://img.shields.io/badge/python-%E2%89%A53.12%20%7C%20%3C3.13-blue?logo=python&logoColor=white.png)
30
+ ![License](https://img.shields.io/badge/license-MIT-green)
31
+ ![Docs](https://img.shields.io/badge/docs-available-blueviolet)
32
+
33
+
34
+ <p align="right">
35
+ <img src="https://github.com/jkubis96/Logos/blob/main/logos/jbs_current.png?raw=true" alt="drawing" width="200" />
36
+ </p>
37
+
38
+
39
+ ### Author: Jakub Kubiś
40
+
41
+ <div align="left">
42
+ Institute of Bioorganic Chemistry<br />
43
+ Polish Academy of Sciences<br />
44
+ Laboratory of Single Cell Analyses
45
+ </div>
46
+
47
+
48
+ ## Description
49
+
50
+
51
+ <div align="justify"> <strong>JDtI</strong> (JDataIntegration) is a Python library for data integration and advanced post-processing of single-cell datasets.
52
+
53
+ JDtI enables basic quality control steps such as control of cells per cluster, number of genes per cell, and more advanced tasks like subclustering, integration, and wide visualization. In this approach, we do not drop the cell information during separate set analyses; instead, we use previous cluster cell lineage information for integrating data based on cluster markers and data harmonization. After integration, it is possible to visualize cell interactions and correlations in many ways, including cell distance, correlations, and more.
54
+
55
+ Despite this, it is also able to conduct DEG analysis between sets, selected cells, or grouped cells, and visualize the results on UMAP, volcano plots, and regression plots comparing pairs of cells. It is very powerful for more advanced analyses focusing on specific issues within the data that may not be discovered in basic analyses.
56
+
57
+ Additionally, JDtI offers many functions for data visualization and processing within clean visual outputs, such as volcano plots, gene expression analysis of different data types, clustering, heatmaps, and more.
58
+
59
+ <p align="center">
60
+ <img src="https://github.com/jkubis96/JDtI/blob/v.1/fig/logo.png?raw=true" alt="drawing" width="500" />
61
+ </p>
62
+
63
+ It is compatible with various sequencing approaches, including scRNA-seq and bulk RNA-seq, and supports interoperability with tools such as <em>Seurat</em>, <em>Scanpy</em>, and other bioinformatics frameworks using the 10x sparse matrix format as input. More details about the available functions can be found in the Documentation and Example Usage section on GitHub.
64
+ </div>
65
+
66
+
67
+
68
+
69
+
70
+ </br>
71
+
72
+
73
+
74
+
75
+ ### Table of contents
76
+
77
+ [Installation](#installation)
78
+ [Documenation](#doc)
79
+
80
+ [Example usage:](#example)
81
+ [1. Basic functions](#bf)
82
+ [2. Data clustering](#dc)
83
+ [3. Data integration](#di)
84
+ [4. Data subclustering](#ds)
85
+
86
+
87
+
88
+
89
+ <br />
90
+
91
+ ## Installation <a id="installation"></a>
92
+
93
+
94
+ ```
95
+ pip install jdti
96
+ ```
97
+
98
+ <br />
99
+
100
+
101
+ ## Documentation <a id="doc"></a>
102
+
103
+
104
+ Documentation for classes and functions is available here 👉 [Documentation 📄](https://jkubis96.github.io/JDtI/jdti.html)
105
+
106
+
107
+ <br />
108
+
109
+
110
+ ## Example usage <a id="example"></a>
111
+
112
+ ### 1. Basic functions <a id="bf"></a>
113
+
114
+ ##### 1. Loading functions <a id="bf"></a>
115
+
116
+ ```
117
+ from jdti import *
118
+ ```
119
+ ##### 2. Loading data <a id="bf"></a>
120
+
121
+ ```
122
+ # load sparse matrix as pd.DataFrame data with creating metadata
123
+ data, metadata = load_sparse(path = 'data/set1', name = 'set1')
124
+
125
+ #load data frame from different data type (.tsv, .txt, .tsv)
126
+ data = pd.read_csv('example_data.csv')
127
+
128
+
129
+ ```
130
+
131
+ ```
132
+ fl = find_features(data, features =['KIT', 'MC1', 'EDNRB', 'PAX3'])
133
+
134
+ fl2 = find_features(data, features =['KIT', 'MC1R', 'EDNRB', 'PAX3'])
135
+ ```
136
+
137
+ ```
138
+ nam = find_names(data, names = ['0', '1', '2','10', '1&'])
139
+ ```
140
+
141
+ ```
142
+ data_reduced = reduce_data(data,
143
+ features = fl2['included'],
144
+ names = nam['included'])
145
+ ```
146
+
147
+
148
+
149
+ ```
150
+ DEG = calc_DEG(data_reduced,
151
+ metadata_list = None,
152
+ entities = compare_dict,
153
+ sets = None,
154
+ min_exp = 0,
155
+ min_pct = 0.1,
156
+ n_proc =10)
157
+ ```
158
+
159
+ ```
160
+ DEG2 = calc_DEG(data,
161
+ metadata_list = metadata['sets'],
162
+ entities = compare_dict,
163
+ sets = None,
164
+ min_exp = 0,
165
+ min_pct = 0.1,
166
+ n_proc = 10)
167
+ ```
168
+ ```
169
+
170
+ fig = volcano_plot(DEG3,
171
+ p_adj = True,
172
+ top = 25,
173
+ p_val = 0.05,
174
+ lfc = 0.25,
175
+ standard_scale = False,
176
+ rescale_adj = True,
177
+ image_width = 12,
178
+ image_high = 12)
179
+
180
+ ```
181
+
182
+
183
+
184
+ ```
185
+
186
+ DEG3_10 = DEG3.sort_values(['p_val', 'esm', 'log(FC)'], ascending=[True, False, False]).head(10)
187
+
188
+ data_reduced = reduce_data(data,
189
+ features = list(set(DEG3_10['feature'])),
190
+ names = nam['included'])
191
+
192
+ avg = average(data_reduced)
193
+ occ = occurrence(data_reduced)
194
+ ```
195
+
196
+ ```
197
+ fig = features_scatter(expression_data = avg,
198
+ occurence_data = occ,
199
+ features = None,
200
+ metadata_list = None,
201
+ colors = 'viridis',
202
+ hclust = 'complete',
203
+ img_width = 8,
204
+ img_high = 5,
205
+ label_size = 10,
206
+ size_scale = 100,
207
+ x_lab = 'Genes',
208
+ legend_lab = 'log(CPM + 1)',
209
+ bbox_to_anchor_scale = 25,
210
+ bbox_to_anchor_perc=(0.91, 0.55),
211
+ bbox_to_anchor_group=(1.01, 0.4))
212
+
213
+ ```
214
+
215
+
216
+ ```
217
+ fig = development_clust(data = avg,
218
+ method = 'ward',
219
+ img_width = 5,
220
+ img_high = 5)
221
+ ```
222
+
223
+
224
+ ### 2. Data clustering <a id="dc"></a>
225
+
226
+
227
+ ```
228
+ from jdti import Clustering, load_sparse
229
+ ```
230
+
231
+ ```
232
+ data, metadata = load_sparse(path = 'data/set2', name = 'set2')
233
+ clusters = Clustering.add_data_frame(data, metadata)
234
+ ```
235
+ ```
236
+ clusters.clustering_data
237
+ clusters.clustering_metadata
238
+ ```
239
+ ```
240
+ clusters.perform_PCA(pc_num=100, width=8, height=6)
241
+
242
+ clusters.knee_plot_PCA(width=8, height=6)
243
+ ```
244
+ ```
245
+ clusters.harmonize_sets(harmonize_type='harmony')
246
+ ```
247
+
248
+ ```
249
+ clusters.find_clusters_PCA(pc_num=0, eps=0.5, min_samples=10, width=8, height=6, harmonized=False)
250
+ ```
251
+ ```
252
+ clusters.perform_UMAP(factorize=False, umap_num=0, pc_num=5, harmonized=False)
253
+
254
+
255
+ clusters.knee_plot_umap(eps=0.5, min_samples=10)
256
+ ```
257
+
258
+ ```
259
+ clusters.find_clusters_UMAP(umap_n=5, eps=0.5, min_samples=10, width=8, height=6)
260
+
261
+
262
+ clusters.UMAP_vis(names_slot='cell_names', set_sep=True, point_size=0.6)
263
+ ```
264
+
265
+ ```
266
+ clusters.UMAP_feature(feature_name = 'KIT', features_data=None, point_size=0.6)
267
+ ```
268
+
269
+ ```
270
+ clusters.get_umap_data()
271
+
272
+ clusters.get_pca_data()
273
+
274
+ clusters.return_clusters(clusters='umap')
275
+
276
+ ```
277
+
278
+
279
+
280
+ ### 3. Data integration <a id="di"></a>
281
+
282
+
283
+ ```
284
+
285
+ from jdti import COMPsc, volcano_plot
286
+ ```
287
+
288
+ ```
289
+ jseq_object = COMPsc.project_dir('data', ['set1', 'set2'])
290
+
291
+ jseq_object.load_sparse_from_projects(normalized_data=True)
292
+ ```
293
+ ```
294
+ dt = jseq_object.get_partial_data(names=['10'], features=['KIT', 'PAX3', 'MITF'], name_slot='cell_names')
295
+ ```
296
+
297
+ ```
298
+ jseq_object.gene_histograme(bins=100)
299
+
300
+ jseq_object.gene_threshold(min_n = 50, max_n = 3000)
301
+
302
+ jseq_object.gene_histograme(bins=100)
303
+
304
+ jseq_object.reduce(reg = '5', inc_set = False)
305
+
306
+ jseq_object.gene_histograme(bins=100)
307
+ ```
308
+
309
+ ```
310
+ jseq_object.cell_histograme(name_slot = 'cell_names')
311
+
312
+ jseq_object.cluster_threshold(min_n = 20, name_slot = 'cell_names')
313
+
314
+ jseq_object.cell_histograme(name_slot = 'cell_names')
315
+ ```
316
+
317
+ ```
318
+ # returny
319
+
320
+ met = jseq_object.input_metadata
321
+
322
+ data = jseq_object.get_data(set_info=True)
323
+
324
+ metadata = jseq_object.get_metadata()
325
+ ```
326
+
327
+ ```
328
+ jseq_object.calculate_difference_markers(min_exp = 0,
329
+ min_pct = 0.25,
330
+ n_proc=10,
331
+ force = False)
332
+
333
+
334
+
335
+ jseq_object.estimating_similarity(method = 'pearson',
336
+ p_val = 0.05,
337
+ top_n = 10)
338
+
339
+
340
+ pl = jseq_object.similarity_plot(split_sets = True,
341
+ set_info = True,
342
+ cmap='seismic',
343
+ width = 16, height = 14)
344
+
345
+
346
+ # pl.savefig(f'sim_plot_top_{top}.svg', dpi=300, bbox_inches='tight')
347
+ ```
348
+
349
+ pl2 = jseq_object.spatial_similarity(set_info= True,
350
+ bandwidth = 1,
351
+ n_neighbors = 5,
352
+ min_dist = 0.1,
353
+ legend_split = 2,
354
+ point_size = 20,
355
+ spread=1.0,
356
+ set_op_mix_ratio=1.0,
357
+ local_connectivity=1,
358
+ repulsion_strength=1.0,
359
+ negative_sample_rate=5,
360
+ width = 12,
361
+ height = 10)
362
+
363
+
364
+ pl2.savefig(f'sim_plot_map_top_{top}.svg', dpi=300, bbox_inches='tight')
365
+ ```
366
+
367
+ ```
368
+ sim_data = jseq_object.similarity
369
+ sim_data = sim_data[sim_data['set1'] != sim_data['set2']]
370
+
371
+ jseq_object.cell_regression(
372
+ cell_x = '2',
373
+ cell_y = '6',
374
+ set_x = 'set1',
375
+ set_y = 'set2',
376
+ threshold = 6,
377
+ image_width = 12,
378
+ image_high = 7,
379
+ color = 'black')
380
+
381
+
382
+
383
+ ```
384
+
385
+ ```
386
+ jseq_object.clustering_features(name_slot = 'cell_names',
387
+ features_list = None,
388
+ p_val = 0.05,
389
+ top_n = 10,
390
+ adj_mean = False,
391
+ beta = 0.2)
392
+
393
+ jseq_object.perform_PCA(pc_num = 50)
394
+
395
+ jseq_object.knee_plot_PCA()
396
+ ```
397
+ jseq_object.harmonize_sets(harmonize_type = 'harmony')
398
+
399
+ # jseq_object.find_clusters_PCA(pc_num = 100, eps = 0.5, min_samples = 10)
400
+
401
+ jseq_object.perform_UMAP(factorize=False, umap_num = 2, pc_num = 10, harmonized = True)
402
+
403
+
404
+ # jseq_object.knee_plot_umap(eps = 0.5, min_samples = 10)
405
+
406
+
407
+ # jseq_object.find_clusters_UMAP(umap_n = 6, eps = 1, min_samples = 20)
408
+
409
+
410
+ plu = jseq_object.UMAP_vis(
411
+ names_slot = 'cell_names',
412
+ set_sep = True,
413
+ point_size = 1,
414
+ font_size = 6,
415
+ legend_split_col = 2,
416
+ width = 8,
417
+ height = 6,
418
+ inc_num = True)
419
+
420
+ # plu.savefig(f'sim_umap_top.svg', dpi=300, bbox_inches='tight')
421
+
422
+
423
+ plu = jseq_object.UMAP_vis(
424
+ names_slot = 'sets',
425
+ set_sep = True,
426
+ point_size = 1,
427
+ font_size = 6,
428
+ legend_split_col = 1,
429
+ width = 8,
430
+ height = 6,
431
+ inc_num = False)
432
+
433
+ # plu.savefig(f'sim_umap_sets_top_.svg', dpi=300, bbox_inches='tight')
434
+
435
+ ```
436
+
437
+ ```
438
+ vis = jseq_object.UMAP_feature(
439
+ features_data = jseq_object.get_data(set_info = False) ,
440
+ feature_name = 'MAP1B',
441
+ point_size = 0.6,
442
+ font_size = 6,
443
+ width = 8,
444
+ height = 6,
445
+ palette = 'light')
446
+
447
+ # vis.savefig(f'sim_umap_sets_top_vis.svg', dpi=300, bbox_inches='tight')
448
+
449
+ jseq_object.var_data
450
+
451
+
452
+ # jseq_object.save_project(name = 'topola')
453
+
454
+ ```
455
+
456
+ ```
457
+ stats = jseq_object.statistic(cells=None, sets='All', min_exp=0, min_pct=0.025, n_proc=10)
458
+ stats_5 = stats.sort_values(['valid_group', 'esm', 'log(FC)'], ascending=[True, False, False]).groupby('valid_group').head(5)
459
+
460
+
461
+
462
+ fig = volcano_plot(stats)
463
+ ```
464
+ ```
465
+ jseq_object.scatter_plot(
466
+ names = None,
467
+ features = list(set(stats_5['feature'])),
468
+ name_slot = 'cell_names',
469
+ scale = False,
470
+ colors = 'viridis',
471
+ hclust = 'complete',
472
+ img_width = 15,
473
+ img_high = 3,
474
+ label_size = 10,
475
+ size_scale = 200,
476
+ x_lab = 'Genes',
477
+ legend_lab = 'log(CPM + 1)',
478
+ set_box_size = 5,
479
+ set_box_high = 0.1,
480
+ bbox_to_anchor_scale = 25,
481
+ bbox_to_anchor_perc=(0.90, 0.5),
482
+ bbox_to_anchor_group=(0.9, 0.3))
483
+
484
+ ```
485
+
486
+ ```
487
+ import re
488
+
489
+ jseq_object.data_composition(
490
+ features_count = list(set([re.sub(r' .*$', '',x) for x in list(set(jseq_object.input_metadata['cell_names']))])),
491
+ name_slot = 'cell_names',
492
+ set_sep = True
493
+ )
494
+
495
+
496
+ jseq_object.composition_pie(
497
+ width = 6,
498
+ height = 6,
499
+ font_size = 15,
500
+ cmap = "tab20",
501
+ legend_split_col = 1,
502
+ offset_labels = 0.5,
503
+ legend_bbox = (1.15, 0.95))
504
+
505
+
506
+ jseq_object.bar_composition(
507
+ cmap = 'tab20b',
508
+ width = 2,
509
+ height = 6,
510
+ font_size = 15,
511
+ legend_split_col = 1,
512
+ legend_bbox = (1.3, 1))
513
+
514
+
515
+ ```
516
+
517
+
518
+
519
+ ### 4. Data subclustering <a id="ds"></a>
520
+
521
+
522
+ ```
523
+ from jdti import COMPsc
524
+ ```
525
+ ```
526
+ jseq_object = COMPsc.project_dir('data', ['set2'])
527
+ ```
528
+
529
+ ```
530
+ jseq_object.load_sparse_from_projects(normalized_data=True)
531
+ ```
532
+
533
+
534
+
535
+ ```
536
+ jseq_object.subcluster_prepare(features = ['HMGCS1', 'MAP1B', 'SOX4'],
537
+ cluster='10')
538
+ ```
539
+
540
+
541
+ ```
542
+ jseq_object.define_subclusters(
543
+ umap_num = 5,
544
+ eps = 1,
545
+ min_samples = 5,
546
+ n_neighbors = 5,
547
+ min_dist = 0.1,
548
+ spread = 1.0,
549
+ set_op_mix_ratio = 1.0,
550
+ local_connectivity = 1,
551
+ repulsion_strength = 1.0,
552
+ negative_sample_rate = 5,
553
+ width = 8,
554
+ height = 6)
555
+
556
+ ```
557
+
558
+ ```
559
+ jseq_object.subcluster_features_scatter(
560
+ colors = 'viridis',
561
+ hclust = 'complete',
562
+ img_width = 3,
563
+ img_high = 5,
564
+ label_size = 6,
565
+ size_scale = 70,
566
+ x_lab = 'Genes',
567
+ legend_lab = 'normalized')
568
+
569
+ ```
570
+
571
+ ```
572
+ mapping = {
573
+ "old_name": ["-1", "1", "4"],
574
+ "new_name": ["1", "1", "1"]
575
+ }
576
+
577
+ jseq_object.rename_subclusters(mapping)
578
+
579
+ ```
580
+
581
+ ```
582
+ jseq_object.subcluster_DEG_scatter(
583
+ top_n = 3,
584
+ min_exp = 0,
585
+ min_pct = 0.1,
586
+ p_val = 0.05,
587
+ colors = 'viridis',
588
+ hclust = 'complete',
589
+ img_width = 3,
590
+ img_high = 5,
591
+ label_size = 6,
592
+ size_scale = 70,
593
+ x_lab = 'Genes',
594
+ legend_lab = 'normalized',
595
+ n_proc=10)
596
+
597
+
598
+ ```
599
+
600
+ ```
601
+ jseq_object.accept_subclusters()
602
+ ```
603
+
604
+ ```
605
+ l = set(jseq_object.input_metadata['cell_names'])
606
+
607
+ ```
608
+
609
+
610
+ ### Have fun JBS