pycmplot 0.2.6__tar.gz → 0.2.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. {pycmplot-0.2.6 → pycmplot-0.2.8}/PKG-INFO +51 -6
  2. {pycmplot-0.2.6 → pycmplot-0.2.8}/README.md +50 -5
  3. pycmplot-0.2.8/benchmark/bench_python.py +808 -0
  4. pycmplot-0.2.8/benchmark/collect_results.py +380 -0
  5. pycmplot-0.2.8/benchmark/generate_multi_sumstats.py +230 -0
  6. pycmplot-0.2.8/benchmark/generate_sumstats.py +133 -0
  7. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/__init__.py +1 -1
  8. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/_core.py +45 -4
  9. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/annotation.py +4 -3
  10. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/cli.py +77 -7
  11. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/data/hg18ToHg38.over.chain.gz +0 -0
  12. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/io.py +300 -38
  13. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/plotting/circular.py +45 -27
  14. pycmplot-0.2.8/pycmplot/plotting/linear.py +1333 -0
  15. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/plotting/qq.py +71 -45
  16. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/stats.py +9 -7
  17. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot.egg-info/PKG-INFO +51 -6
  18. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot.egg-info/SOURCES.txt +4 -0
  19. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot.egg-info/top_level.txt +1 -0
  20. {pycmplot-0.2.6 → pycmplot-0.2.8}/pyproject.toml +1 -1
  21. {pycmplot-0.2.6 → pycmplot-0.2.8}/setup.cfg +1 -1
  22. pycmplot-0.2.6/pycmplot/plotting/linear.py +0 -1082
  23. {pycmplot-0.2.6 → pycmplot-0.2.8}/LICENSE +0 -0
  24. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/__main__.py +0 -0
  25. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/constants.py +0 -0
  26. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/data/Homo_sapiens.GRCh37.geneinfo.tsv.gz +0 -0
  27. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/data/Homo_sapiens.GRCh38.geneinfo.tsv.gz +0 -0
  28. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/data/hg19ToHg38.over.chain.gz +0 -0
  29. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/liftover.py +0 -0
  30. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/plotting/__init__.py +0 -0
  31. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot/resources.py +0 -0
  32. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot.egg-info/dependency_links.txt +0 -0
  33. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot.egg-info/entry_points.txt +0 -0
  34. {pycmplot-0.2.6 → pycmplot-0.2.8}/pycmplot.egg-info/requires.txt +0 -0
  35. {pycmplot-0.2.6 → pycmplot-0.2.8}/setup.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pycmplot
3
- Version: 0.2.6
3
+ Version: 0.2.8
4
4
  Summary: Multi-track circular and linear Manhattan plot generation for GWAS summary statistics
5
5
  Author: Kevin Esoh
6
6
  Author-email: Kevin Esoh <kesohku1@jh.edu>
@@ -68,6 +68,8 @@ option of the package should be used to indicate the column and then the package
68
68
  postions in hg19 to hg38 ensuring that hits table generation and plotting are done with one unified
69
69
  corrdinate system.
70
70
 
71
+ # Key features
72
+ ## Column auto-detection
71
73
  A key functionality of the package is its ability to auto-detect certain columns if ommited on the
72
74
  command-line or python API:
73
75
  - Chromosome column: `-chr, --chrom_column` or ommited
@@ -90,11 +92,54 @@ bld_candidates = [build, 'BUILD', 'Genome', 'Genome_Build', 'Genome-build']
90
92
 
91
93
  > NB: Upper and lower cases of the candidates are also considered, making each candidate expanded 3 times.
92
94
 
93
-
94
- Since GWAS summary stats files can be very large, to improve speed and memory efficiency, it is
95
- **highly recommended** to use `-tp, --trim_pval` with a value to exclude variants with p-value above a
96
- certain threshold, e.g. `0.01 (1e-2)` or `0.001 (1e-3)`.
97
-
95
+ ## Density-aware sub-sampling
96
+ Another key feature is density-aware sub-sampling for Manhattan-style scatter plots.
97
+ This was inspired by ``gwaslab``'s default behaviour (https://cloufield.github.io/gwaslab/).
98
+
99
+ Every variant whose "interestingness" signal is at or above ``keep_threshold`` is preserved (so peaks, suggestive hits, genome-wide-significant hits, and extreme
100
+ selection-scan values are kept verbatim). It uniformly sub-samples the dense bulk
101
+ below the threshold down to at most ``max_below`` rows in total. For a 10 M-variant
102
+ scan with the defaults below, this typically cuts the plotted point count from 10 M
103
+ to ~200 K + a few hundred peaks — visually indistinguishable above the suggestive
104
+ band, but two orders of magnitude faster to render.
105
+
106
+ ## Trim insignificant variants for faster plotting
107
+ An optional parameter `-tp, --trim_pval` is provided to increase speed even further.
108
+ Set with a value to exclude variants with p-value above a certain threshold,
109
+ e.g. `0.01 (1e-2)` or `0.001 (1e-3)`. Performed on top of the default auto-thin
110
+ feature above, it siginificant increases speed and reduces peak memory usage.
111
+ See benchmark figure (manuscript in preparation).
112
+
113
+ ## Genome build conversion (liftover)
114
+ Conversion of a both hg18 and hg19 positions to their hg38 equivalent is included through
115
+ `pyliftover.LiftOver`.
116
+
117
+ This means you can concatenate multiple summary stats into one file and include a `BUILD`
118
+ column to specify the genome build of each position ('hg18', 'hg19', or 'hg38') and all
119
+ 'hg18' and 'hg19' positions will be converted to 'hg38' so that all positions are plotted
120
+ using one coordinate system. If only 'hg18' or 'hg19' positions are present, no liftover
121
+ be necessary. Hence, liftover is only performed in cases of mixed genome builds.
122
+
123
+ ## Nearest-gene annotation for GWAS lead SNPs
124
+ The package bundles GFF3 files in hg19 and hg38 coordinates processed to reduce size
125
+ for gene annotation. Also included are UCSC chain files for coordinate conversion (liftover).
126
+ - ``chain_hg19_hg38`` -- UCSC LiftOver chain file for hg19 to hg38
127
+ conversion. Resolved from ``PYCMPLOT_CHAIN_HG19_HG38`` or the bundled
128
+ ``hg19ToHg38.over.chain.gz``.
129
+ - ``chain_hg18_hg38`` -- UCSC LiftOver chain file for hg18 to hg38
130
+ conversion. Resolved from ``PYCMPLOT_CHAIN_HG18_HG38`` or the bundled
131
+ ``hg18ToHg38.over.chain.gz``. Only required when any input summary
132
+ statistics file carries a ``hg18`` build label.
133
+ - ``geneinfo_hg38`` -- Ensembl gene-info TSV for GRCh38, used for
134
+ nearest-gene annotation. Resolved from ``PYCMPLOT_GENEINFO_HG38`` or
135
+ the bundled ``Homo_sapiens.GRCh38.geneinfo.tsv.gz``.
136
+ - ``geneinfo_hg19`` -- Ensembl gene-info TSV for GRCh37, used when
137
+ input data carry a hg19 build label. Resolved from
138
+ ``PYCMPLOT_GENEINFO_HG19`` or the bundled
139
+ ``Homo_sapiens.GRCh37.geneinfo.tsv.gz``.
140
+
141
+
142
+ # Application
98
143
  A potential useful application is **comparative visualization** of results from multiple imputation panels,
99
144
  multiple populations, or multiple traits to observe shared genetic architecture.
100
145
 
@@ -29,6 +29,8 @@ option of the package should be used to indicate the column and then the package
29
29
  postions in hg19 to hg38 ensuring that hits table generation and plotting are done with one unified
30
30
  corrdinate system.
31
31
 
32
+ # Key features
33
+ ## Column auto-detection
32
34
  A key functionality of the package is its ability to auto-detect certain columns if ommited on the
33
35
  command-line or python API:
34
36
  - Chromosome column: `-chr, --chrom_column` or ommited
@@ -51,11 +53,54 @@ bld_candidates = [build, 'BUILD', 'Genome', 'Genome_Build', 'Genome-build']
51
53
 
52
54
  > NB: Upper and lower cases of the candidates are also considered, making each candidate expanded 3 times.
53
55
 
54
-
55
- Since GWAS summary stats files can be very large, to improve speed and memory efficiency, it is
56
- **highly recommended** to use `-tp, --trim_pval` with a value to exclude variants with p-value above a
57
- certain threshold, e.g. `0.01 (1e-2)` or `0.001 (1e-3)`.
58
-
56
+ ## Density-aware sub-sampling
57
+ Another key feature is density-aware sub-sampling for Manhattan-style scatter plots.
58
+ This was inspired by ``gwaslab``'s default behaviour (https://cloufield.github.io/gwaslab/).
59
+
60
+ Every variant whose "interestingness" signal is at or above ``keep_threshold`` is preserved (so peaks, suggestive hits, genome-wide-significant hits, and extreme
61
+ selection-scan values are kept verbatim). It uniformly sub-samples the dense bulk
62
+ below the threshold down to at most ``max_below`` rows in total. For a 10 M-variant
63
+ scan with the defaults below, this typically cuts the plotted point count from 10 M
64
+ to ~200 K + a few hundred peaks — visually indistinguishable above the suggestive
65
+ band, but two orders of magnitude faster to render.
66
+
67
+ ## Trim insignificant variants for faster plotting
68
+ An optional parameter `-tp, --trim_pval` is provided to increase speed even further.
69
+ Set with a value to exclude variants with p-value above a certain threshold,
70
+ e.g. `0.01 (1e-2)` or `0.001 (1e-3)`. Performed on top of the default auto-thin
71
+ feature above, it siginificant increases speed and reduces peak memory usage.
72
+ See benchmark figure (manuscript in preparation).
73
+
74
+ ## Genome build conversion (liftover)
75
+ Conversion of a both hg18 and hg19 positions to their hg38 equivalent is included through
76
+ `pyliftover.LiftOver`.
77
+
78
+ This means you can concatenate multiple summary stats into one file and include a `BUILD`
79
+ column to specify the genome build of each position ('hg18', 'hg19', or 'hg38') and all
80
+ 'hg18' and 'hg19' positions will be converted to 'hg38' so that all positions are plotted
81
+ using one coordinate system. If only 'hg18' or 'hg19' positions are present, no liftover
82
+ be necessary. Hence, liftover is only performed in cases of mixed genome builds.
83
+
84
+ ## Nearest-gene annotation for GWAS lead SNPs
85
+ The package bundles GFF3 files in hg19 and hg38 coordinates processed to reduce size
86
+ for gene annotation. Also included are UCSC chain files for coordinate conversion (liftover).
87
+ - ``chain_hg19_hg38`` -- UCSC LiftOver chain file for hg19 to hg38
88
+ conversion. Resolved from ``PYCMPLOT_CHAIN_HG19_HG38`` or the bundled
89
+ ``hg19ToHg38.over.chain.gz``.
90
+ - ``chain_hg18_hg38`` -- UCSC LiftOver chain file for hg18 to hg38
91
+ conversion. Resolved from ``PYCMPLOT_CHAIN_HG18_HG38`` or the bundled
92
+ ``hg18ToHg38.over.chain.gz``. Only required when any input summary
93
+ statistics file carries a ``hg18`` build label.
94
+ - ``geneinfo_hg38`` -- Ensembl gene-info TSV for GRCh38, used for
95
+ nearest-gene annotation. Resolved from ``PYCMPLOT_GENEINFO_HG38`` or
96
+ the bundled ``Homo_sapiens.GRCh38.geneinfo.tsv.gz``.
97
+ - ``geneinfo_hg19`` -- Ensembl gene-info TSV for GRCh37, used when
98
+ input data carry a hg19 build label. Resolved from
99
+ ``PYCMPLOT_GENEINFO_HG19`` or the bundled
100
+ ``Homo_sapiens.GRCh37.geneinfo.tsv.gz``.
101
+
102
+
103
+ # Application
59
104
  A potential useful application is **comparative visualization** of results from multiple imputation panels,
60
105
  multiple populations, or multiple traits to observe shared genetic architecture.
61
106