genal-python 0.6__tar.gz → 0.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (171) hide show
  1. genal_python-0.8/.DS_Store +0 -0
  2. {genal_python-0.6 → genal_python-0.8}/.gitignore +1 -2
  3. genal_python-0.8/.readthedocs.yaml +22 -0
  4. genal_python-0.6/README.md → genal_python-0.8/PKG-INFO +98 -9
  5. genal_python-0.6/PKG-INFO → genal_python-0.8/README.md +73 -31
  6. genal_python-0.8/docs/.DS_Store +0 -0
  7. genal_python-0.8/docs/build/.buildinfo +4 -0
  8. genal_python-0.8/docs/build/.doctrees/api.doctree +0 -0
  9. genal_python-0.8/docs/build/.doctrees/environment.pickle +0 -0
  10. genal_python-0.8/docs/build/.doctrees/genal.doctree +0 -0
  11. genal_python-0.8/docs/build/.doctrees/index.doctree +0 -0
  12. genal_python-0.8/docs/build/.doctrees/introduction.doctree +0 -0
  13. genal_python-0.8/docs/build/.doctrees/modules.doctree +0 -0
  14. genal_python-0.8/docs/build/_modules/genal/Geno.html +1480 -0
  15. genal_python-0.8/docs/build/_modules/genal/MR.html +1065 -0
  16. genal_python-0.8/docs/build/_modules/genal/MR_tools.html +671 -0
  17. genal_python-0.8/docs/build/_modules/genal/MRpresso.html +409 -0
  18. genal_python-0.8/docs/build/_modules/genal/association.html +445 -0
  19. genal_python-0.8/docs/build/_modules/genal/clump.html +183 -0
  20. genal_python-0.8/docs/build/_modules/genal/extract_prs.html +426 -0
  21. genal_python-0.8/docs/build/_modules/genal/geno_tools.html +567 -0
  22. genal_python-0.8/docs/build/_modules/genal/lift.html +371 -0
  23. genal_python-0.8/docs/build/_modules/genal/proxy.html +359 -0
  24. genal_python-0.8/docs/build/_modules/genal/snp_query.html +231 -0
  25. genal_python-0.8/docs/build/_modules/genal/tools.html +440 -0
  26. genal_python-0.8/docs/build/_modules/index.html +114 -0
  27. genal_python-0.8/docs/build/_sources/api.rst.txt +100 -0
  28. genal_python-0.8/docs/build/_sources/index.rst.txt +69 -0
  29. genal_python-0.8/docs/build/_sources/introduction.rst.txt +666 -0
  30. genal_python-0.8/docs/build/_sources/modules.rst.txt +82 -0
  31. genal_python-0.8/docs/build/_static/basic.css +925 -0
  32. genal_python-0.8/docs/build/_static/css/badge_only.css +1 -0
  33. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
  34. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
  35. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
  36. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
  37. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.eot +0 -0
  38. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.svg +2671 -0
  39. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.ttf +0 -0
  40. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.woff +0 -0
  41. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
  42. genal_python-0.8/docs/build/_static/css/fonts/lato-bold-italic.woff +0 -0
  43. genal_python-0.8/docs/build/_static/css/fonts/lato-bold-italic.woff2 +0 -0
  44. genal_python-0.8/docs/build/_static/css/fonts/lato-bold.woff +0 -0
  45. genal_python-0.8/docs/build/_static/css/fonts/lato-bold.woff2 +0 -0
  46. genal_python-0.8/docs/build/_static/css/fonts/lato-normal-italic.woff +0 -0
  47. genal_python-0.8/docs/build/_static/css/fonts/lato-normal-italic.woff2 +0 -0
  48. genal_python-0.8/docs/build/_static/css/fonts/lato-normal.woff +0 -0
  49. genal_python-0.8/docs/build/_static/css/fonts/lato-normal.woff2 +0 -0
  50. genal_python-0.8/docs/build/_static/css/theme.css +4 -0
  51. genal_python-0.8/docs/build/_static/doctools.js +156 -0
  52. genal_python-0.8/docs/build/_static/documentation_options.js +13 -0
  53. genal_python-0.8/docs/build/_static/file.png +0 -0
  54. genal_python-0.8/docs/build/_static/js/badge_only.js +1 -0
  55. genal_python-0.8/docs/build/_static/js/html5shiv-printshiv.min.js +4 -0
  56. genal_python-0.8/docs/build/_static/js/html5shiv.min.js +4 -0
  57. genal_python-0.8/docs/build/_static/js/theme.js +1 -0
  58. genal_python-0.8/docs/build/_static/language_data.js +199 -0
  59. genal_python-0.8/docs/build/_static/minus.png +0 -0
  60. genal_python-0.8/docs/build/_static/plus.png +0 -0
  61. genal_python-0.8/docs/build/_static/pygments.css +75 -0
  62. genal_python-0.8/docs/build/_static/searchtools.js +619 -0
  63. genal_python-0.8/docs/build/_static/sphinx_highlight.js +154 -0
  64. genal_python-0.8/docs/build/api.html +2251 -0
  65. genal_python-0.8/docs/build/genal.html +2060 -0
  66. genal_python-0.8/docs/build/genindex.html +584 -0
  67. genal_python-0.8/docs/build/index.html +186 -0
  68. genal_python-0.8/docs/build/introduction.html +706 -0
  69. genal_python-0.8/docs/build/modules.html +754 -0
  70. genal_python-0.8/docs/build/objects.inv +0 -0
  71. genal_python-0.8/docs/build/py-modindex.html +177 -0
  72. genal_python-0.8/docs/build/search.html +122 -0
  73. genal_python-0.8/docs/build/searchindex.js +1 -0
  74. genal_python-0.8/docs/requirements.txt +14 -0
  75. genal_python-0.8/docs/source/.DS_Store +0 -0
  76. genal_python-0.8/docs/source/Images/MR_plot_SBP_AS.png +0 -0
  77. genal_python-0.8/docs/source/api.rst +100 -0
  78. {genal_python-0.6 → genal_python-0.8}/docs/source/conf.py +3 -2
  79. {genal_python-0.6 → genal_python-0.8}/docs/source/index.rst +14 -4
  80. genal_python-0.8/docs/source/introduction.rst +666 -0
  81. genal_python-0.8/docs/source/modules.rst +82 -0
  82. {genal_python-0.6 → genal_python-0.8}/genal/Geno.py +138 -59
  83. {genal_python-0.6 → genal_python-0.8}/genal/MR.py +16 -16
  84. {genal_python-0.6 → genal_python-0.8}/genal/MR_tools.py +11 -0
  85. {genal_python-0.6 → genal_python-0.8}/genal/__init__.py +1 -1
  86. {genal_python-0.6 → genal_python-0.8}/genal/constants.py +1 -0
  87. {genal_python-0.6 → genal_python-0.8}/genal/extract_prs.py +1 -1
  88. {genal_python-0.6 → genal_python-0.8}/genal/geno_tools.py +5 -2
  89. genal_python-0.8/genal/snp_query.py +122 -0
  90. {genal_python-0.6 → genal_python-0.8}/genal/tools.py +19 -10
  91. {genal_python-0.6 → genal_python-0.8}/pyproject.toml +13 -11
  92. {genal_python-0.6 → genal_python-0.8}/requirements.txt +3 -2
  93. genal_python-0.6/docs/requirements.txt +0 -2
  94. genal_python-0.6/docs/source/api.rst +0 -24
  95. genal_python-0.6/docs/source/introduction.rst +0 -505
  96. genal_python-0.6/docs/source/modules.rst +0 -7
  97. {genal_python-0.6 → genal_python-0.8}/LICENSE +0 -0
  98. {genal_python-0.6 → genal_python-0.8}/docs/Makefile +0 -0
  99. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/api.doctree +0 -0
  100. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/environment.pickle +0 -0
  101. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/genal.doctree +0 -0
  102. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/index.doctree +0 -0
  103. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/introduction.doctree +0 -0
  104. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/modules.doctree +0 -0
  105. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/source/genal.doctree +0 -0
  106. {genal_python-0.6 → genal_python-0.8}/docs/_build/doctrees/source/modules.doctree +0 -0
  107. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/.buildinfo +0 -0
  108. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/api.rst.txt +0 -0
  109. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/genal.rst.txt +0 -0
  110. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/index.rst.txt +0 -0
  111. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/introduction.rst.txt +0 -0
  112. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/modules.rst.txt +0 -0
  113. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/source/genal.rst.txt +0 -0
  114. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_sources/source/modules.rst.txt +0 -0
  115. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/_sphinx_javascript_frameworks_compat.js +0 -0
  116. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/basic.css +0 -0
  117. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/badge_only.css +0 -0
  118. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
  119. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
  120. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
  121. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
  122. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.eot +0 -0
  123. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.svg +0 -0
  124. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.ttf +0 -0
  125. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.woff +0 -0
  126. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
  127. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold-italic.woff +0 -0
  128. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold-italic.woff2 +0 -0
  129. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold.woff +0 -0
  130. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold.woff2 +0 -0
  131. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal-italic.woff +0 -0
  132. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal-italic.woff2 +0 -0
  133. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal.woff +0 -0
  134. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal.woff2 +0 -0
  135. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/css/theme.css +0 -0
  136. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/doctools.js +0 -0
  137. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/documentation_options.js +0 -0
  138. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/file.png +0 -0
  139. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/jquery.js +0 -0
  140. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/js/badge_only.js +0 -0
  141. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/js/html5shiv-printshiv.min.js +0 -0
  142. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/js/html5shiv.min.js +0 -0
  143. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/js/theme.js +0 -0
  144. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/language_data.js +0 -0
  145. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/minus.png +0 -0
  146. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/plus.png +0 -0
  147. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/pygments.css +0 -0
  148. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/searchtools.js +0 -0
  149. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/_static/sphinx_highlight.js +0 -0
  150. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/api.html +0 -0
  151. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/genal.html +0 -0
  152. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/genindex.html +0 -0
  153. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/index.html +0 -0
  154. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/introduction.html +0 -0
  155. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/modules.html +0 -0
  156. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/objects.inv +0 -0
  157. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/py-modindex.html +0 -0
  158. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/search.html +0 -0
  159. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/searchindex.js +0 -0
  160. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/source/genal.html +0 -0
  161. {genal_python-0.6 → genal_python-0.8}/docs/_build/html/source/modules.html +0 -0
  162. {genal_python-0.6/docs/Images → genal_python-0.8/docs/build/_images}/MR_plot_SBP_AS.png +0 -0
  163. /genal_python-0.6/docs/source/genal.rst → /genal_python-0.8/docs/build/_sources/genal.rst.txt +0 -0
  164. {genal_python-0.6 → genal_python-0.8}/docs/make.bat +0 -0
  165. {genal_python-0.6 → genal_python-0.8}/genal/MRpresso.py +0 -0
  166. {genal_python-0.6 → genal_python-0.8}/genal/association.py +0 -0
  167. {genal_python-0.6 → genal_python-0.8}/genal/clump.py +0 -0
  168. {genal_python-0.6 → genal_python-0.8}/genal/lift.py +0 -0
  169. {genal_python-0.6 → genal_python-0.8}/genal/proxy.py +0 -0
  170. {genal_python-0.6 → genal_python-0.8}/gitignore +0 -0
  171. {genal_python-0.6 → genal_python-0.8}/readthedocs.yaml +0 -0
Binary file
@@ -2,5 +2,4 @@ __pycache__/
2
2
  dist/
3
3
  .ipynb_checkpoints/
4
4
  ipynb_checkpoints/
5
- genal/.ipynb_checkpoints/
6
- docs/
5
+ genal/.ipynb_checkpoints/
@@ -0,0 +1,22 @@
1
+ # .readthedocs.yaml
2
+ # Read the Docs configuration file
3
+ # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4
+
5
+ # Required
6
+ version: 2
7
+
8
+ # Set the version of Python and other tools you might need
9
+ build:
10
+ os: ubuntu-22.04
11
+ tools:
12
+ python: "3.11"
13
+
14
+ # Build documentation in the docs/ directory with Sphinx
15
+ sphinx:
16
+ configuration: docs/source/conf.py
17
+
18
+ # We recommend specifying your dependencies to enable reproducible builds:
19
+ # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
20
+ python:
21
+ install:
22
+ - requirements: docs/requirements.txt
@@ -1,3 +1,27 @@
1
+ Metadata-Version: 2.1
2
+ Name: genal-python
3
+ Version: 0.8
4
+ Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
5
+ Author-email: Cyprien Rivier <riviercyprien@gmail.com>
6
+ Requires-Python: >=3.7
7
+ Description-Content-Type: text/markdown
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Dist: aiohttp==3.9.5
12
+ Requires-Dist: nest_asyncio==1.5.5
13
+ Requires-Dist: numpy>=1.24.4, <2.0
14
+ Requires-Dist: pandas>=2.0.3
15
+ Requires-Dist: plotnine==0.12.3
16
+ Requires-Dist: psutil==5.9.1
17
+ Requires-Dist: pyliftover==0.4
18
+ Requires-Dist: scikit_learn>=1.3.0
19
+ Requires-Dist: scipy>=1.11.4
20
+ Requires-Dist: statsmodels==0.14.0
21
+ Requires-Dist: tqdm==4.66.1
22
+ Requires-Dist: wget==3.2
23
+ Project-URL: Home, https://github.com/CypRiv/genal
24
+
1
25
  <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
2
26
 
3
27
 
@@ -7,10 +31,11 @@
7
31
 
8
32
  # Table of contents
9
33
  1. [Introduction](#introduction)
10
- 2. [Citation] (#citation)
34
+ 2. [Citation](#citation)
11
35
  3. [Requirements for the genal module](#paragraph1)
12
36
  4. [Installation and how to use genal](#paragraph2)
13
37
  1. [Installation](#paragraph2.1)
38
+ 2. [Documentation](#paragraph2.2)
14
39
  5. [Tutorial and presentation of the main tools](#paragraph3)
15
40
  1. [Data loading](#paragraph3.1)
16
41
  2. [Data preprocessing](#paragraph3.2)
@@ -19,6 +44,7 @@
19
44
  5. [Mendelian Randomization](#paragraph3.5)
20
45
  6. [SNP-association testing](#paragraph3.6)
21
46
  7. [Lifting](#paragraph3.7)
47
+ 8. [GWAS Catalog](#paragraph3.8)
22
48
 
23
49
 
24
50
  ## Introduction <a name="introduction"></a>
@@ -55,6 +81,16 @@ Once downloaded, the path to the plink executable can be set with:
55
81
  ```
56
82
  genal.set_plink(path="/path/to/plink/executable/file")
57
83
  ```
84
+ ### Documentation <a name="paragraph2.2"></a>
85
+
86
+ For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
87
+
88
+ The documentation covers:
89
+ - Installation
90
+ - This tutorial
91
+ - The list of the main functions with complete description of their arguments
92
+ - An exhaustive API reference
93
+
58
94
 
59
95
  ## Tutorial <a name="paragraph3"></a>
60
96
  For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
@@ -75,11 +111,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
75
111
  - Data lifting to another genomic build
76
112
  - In pure Python
77
113
  - Using LiftOver
78
- - Phenoscanner (to be added)
114
+ - Querying the GWAS Catalog
79
115
 
80
116
  ### Data loading <a name="paragraph3.1"></a>
81
117
 
82
- We begin with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
118
+ We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
83
119
 
84
120
  ```python
85
121
  import pandas as pd
@@ -108,6 +144,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
108
144
  - **P**: Column name for effect p-value. Defaults to `'P'`.
109
145
  - **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
110
146
 
147
+ > **Note:**
148
+ >
149
+ > You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
150
+
111
151
  After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
112
152
 
113
153
  ```python
@@ -133,7 +173,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
133
173
 
134
174
  > **Note:**
135
175
  >
136
- > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele. Also, you do not need all columns to move forward, as some can be inputted as we will see next.
176
+ > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
137
177
 
138
178
  ### Data preprocessing <a name="paragraph3.2"></a>
139
179
 
@@ -312,7 +352,7 @@ and the output is:
312
352
  The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
313
353
  PRS data saved to SBP_prs.csv
314
354
 
315
- In our case, we have been able to find proxies for 571 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
355
+ In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
316
356
 
317
357
  You can customize how the proxies are chosen with the following arguments:
318
358
  - `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
@@ -322,7 +362,7 @@ You can customize how the proxies are chosen with the following arguments:
322
362
 
323
363
  > **Note:**
324
364
  >
325
- > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of instruments used to compute the scores.
365
+ > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
326
366
 
327
367
 
328
368
  ### Mendelian Randomization <a name="paragraph3.5"></a>
@@ -437,7 +477,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
437
477
  SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
438
478
  ```
439
479
 
440
- ![MR plot](docs/Images/MR_plot_SBP_AS.png)
480
+ ![MR plot](docs/build/_images/MR_plot_SBP_AS.png)
441
481
  You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
442
482
 
443
483
  If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
@@ -457,7 +497,7 @@ As expected, many MR methods indicate that SBP is strongly associated with strok
457
497
  To investigate horizontal pleiotropy in more details, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO). MR-PRESSO is a method designed to detect and correct for horizontal pleiotropy. It will identify which instruments are likely to be pleiotropic on their effect on the outcome, and it will rerun an inverse-variance weighted MR after excluding them. It can be run using the `genal.Geno.MRpresso` method:
458
498
 
459
499
  ```python
460
- SBP_clumped.MRpresso(action = 2, n_iterations = 30000)
500
+ mod_table, GlobalTest, OutlierTest, BiasTest = SBP_clumped.MRpresso(action = 2, n_iterations = 30000)
461
501
  ```
462
502
 
463
503
  As with the `genal.Geno.MR` method, the `action` argument determines how the pleiotropic SNPs will be treated. The output is a list containing:
@@ -482,7 +522,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
482
522
 
483
523
  > **Note:**
484
524
  >
485
- > One important detail is to make sure that the individual IDs are identical between the phenotypic data and the genetic data for the target population.
525
+ > One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
486
526
 
487
527
  Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
488
528
 
@@ -560,5 +600,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
560
600
  SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
561
601
  ```
562
602
 
603
+ ### GWAS Catalog <a name="paragraph3.8"></a>
563
604
 
605
+ It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
564
606
 
607
+ ```python
608
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
609
+ ```
610
+ Which will output:
611
+
612
+ Querying the GWAS Catalog and creating the ASSOC column.
613
+ Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
614
+ To report the p-value for each association, use return_p=True.
615
+ To report the study ID for each association, use return_study=True.
616
+ The .data attribute will be modified. Use replace=False to leave it as is.
617
+ 100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
618
+ The ASSOC column has been successfully created.
619
+ 701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
620
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
621
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
622
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
623
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
624
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
625
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
626
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
627
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
628
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
629
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
630
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
631
+
632
+ If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
633
+
634
+ ```python
635
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
636
+ ```
637
+
638
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
639
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
640
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
641
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
642
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
643
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
644
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
645
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
646
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
647
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
648
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
649
+
650
+
651
+ > **Note:**
652
+ >
653
+ > As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
@@ -1,25 +1,3 @@
1
- Metadata-Version: 2.1
2
- Name: genal-python
3
- Version: 0.6
4
- Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
5
- Author-email: Cyprien Rivier <riviercyprien@gmail.com>
6
- Requires-Python: >=3.7
7
- Description-Content-Type: text/markdown
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
10
- Classifier: Operating System :: OS Independent
11
- Requires-Dist: numpy>=1.26.2
12
- Requires-Dist: pandas>=2.0.3
13
- Requires-Dist: plotnine>=0.12.3
14
- Requires-Dist: psutil>=5.9.1
15
- Requires-Dist: pyliftover>=0.4
16
- Requires-Dist: scikit_learn>=1.3.0
17
- Requires-Dist: scipy>=1.11.3
18
- Requires-Dist: statsmodels>=0.14.0
19
- Requires-Dist: tqdm>=4.66.1
20
- Requires-Dist: wget>=3.2
21
- Project-URL: Home, https://github.com/CypRiv/genal
22
-
23
1
  <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
24
2
 
25
3
 
@@ -29,10 +7,11 @@ Project-URL: Home, https://github.com/CypRiv/genal
29
7
 
30
8
  # Table of contents
31
9
  1. [Introduction](#introduction)
32
- 2. [Citation] (#citation)
10
+ 2. [Citation](#citation)
33
11
  3. [Requirements for the genal module](#paragraph1)
34
12
  4. [Installation and how to use genal](#paragraph2)
35
13
  1. [Installation](#paragraph2.1)
14
+ 2. [Documentation](#paragraph2.2)
36
15
  5. [Tutorial and presentation of the main tools](#paragraph3)
37
16
  1. [Data loading](#paragraph3.1)
38
17
  2. [Data preprocessing](#paragraph3.2)
@@ -41,6 +20,7 @@ Project-URL: Home, https://github.com/CypRiv/genal
41
20
  5. [Mendelian Randomization](#paragraph3.5)
42
21
  6. [SNP-association testing](#paragraph3.6)
43
22
  7. [Lifting](#paragraph3.7)
23
+ 8. [GWAS Catalog](#paragraph3.8)
44
24
 
45
25
 
46
26
  ## Introduction <a name="introduction"></a>
@@ -77,6 +57,16 @@ Once downloaded, the path to the plink executable can be set with:
77
57
  ```
78
58
  genal.set_plink(path="/path/to/plink/executable/file")
79
59
  ```
60
+ ### Documentation <a name="paragraph2.2"></a>
61
+
62
+ For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
63
+
64
+ The documentation covers:
65
+ - Installation
66
+ - This tutorial
67
+ - The list of the main functions with complete description of their arguments
68
+ - An exhaustive API reference
69
+
80
70
 
81
71
  ## Tutorial <a name="paragraph3"></a>
82
72
  For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
@@ -97,11 +87,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
97
87
  - Data lifting to another genomic build
98
88
  - In pure Python
99
89
  - Using LiftOver
100
- - Phenoscanner (to be added)
90
+ - Querying the GWAS Catalog
101
91
 
102
92
  ### Data loading <a name="paragraph3.1"></a>
103
93
 
104
- We begin with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
94
+ We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
105
95
 
106
96
  ```python
107
97
  import pandas as pd
@@ -130,6 +120,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
130
120
  - **P**: Column name for effect p-value. Defaults to `'P'`.
131
121
  - **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
132
122
 
123
+ > **Note:**
124
+ >
125
+ > You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
126
+
133
127
  After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
134
128
 
135
129
  ```python
@@ -155,7 +149,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
155
149
 
156
150
  > **Note:**
157
151
  >
158
- > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele. Also, you do not need all columns to move forward, as some can be inputted as we will see next.
152
+ > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
159
153
 
160
154
  ### Data preprocessing <a name="paragraph3.2"></a>
161
155
 
@@ -334,7 +328,7 @@ and the output is:
334
328
  The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
335
329
  PRS data saved to SBP_prs.csv
336
330
 
337
- In our case, we have been able to find proxies for 571 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
331
+ In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
338
332
 
339
333
  You can customize how the proxies are chosen with the following arguments:
340
334
  - `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
@@ -344,7 +338,7 @@ You can customize how the proxies are chosen with the following arguments:
344
338
 
345
339
  > **Note:**
346
340
  >
347
- > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of instruments used to compute the scores.
341
+ > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
348
342
 
349
343
 
350
344
  ### Mendelian Randomization <a name="paragraph3.5"></a>
@@ -459,7 +453,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
459
453
  SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
460
454
  ```
461
455
 
462
- ![MR plot](docs/Images/MR_plot_SBP_AS.png)
456
+ ![MR plot](docs/build/_images/MR_plot_SBP_AS.png)
463
457
  You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
464
458
 
465
459
  If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
@@ -479,7 +473,7 @@ As expected, many MR methods indicate that SBP is strongly associated with strok
479
473
  To investigate horizontal pleiotropy in more details, a very useful method is Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO). MR-PRESSO is a method designed to detect and correct for horizontal pleiotropy. It will identify which instruments are likely to be pleiotropic on their effect on the outcome, and it will rerun an inverse-variance weighted MR after excluding them. It can be run using the `genal.Geno.MRpresso` method:
480
474
 
481
475
  ```python
482
- SBP_clumped.MRpresso(action = 2, n_iterations = 30000)
476
+ mod_table, GlobalTest, OutlierTest, BiasTest = SBP_clumped.MRpresso(action = 2, n_iterations = 30000)
483
477
  ```
484
478
 
485
479
  As with the `genal.Geno.MR` method, the `action` argument determines how the pleiotropic SNPs will be treated. The output is a list containing:
@@ -504,7 +498,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
504
498
 
505
499
  > **Note:**
506
500
  >
507
- > One important detail is to make sure that the individual IDs are identical between the phenotypic data and the genetic data for the target population.
501
+ > One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
508
502
 
509
503
  Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
510
504
 
@@ -582,6 +576,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
582
576
  SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
583
577
  ```
584
578
 
579
+ ### GWAS Catalog <a name="paragraph3.8"></a>
580
+
581
+ It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
582
+
583
+ ```python
584
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
585
+ ```
586
+ Which will output:
587
+
588
+ Querying the GWAS Catalog and creating the ASSOC column.
589
+ Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
590
+ To report the p-value for each association, use return_p=True.
591
+ To report the study ID for each association, use return_study=True.
592
+ The .data attribute will be modified. Use replace=False to leave it as is.
593
+ 100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
594
+ The ASSOC column has been successfully created.
595
+ 701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
596
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
597
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
598
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
599
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
600
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
601
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
602
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
603
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
604
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
605
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
606
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
607
+
608
+ If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
609
+
610
+ ```python
611
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
612
+ ```
585
613
 
614
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
615
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
616
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
617
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
618
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
619
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
620
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
621
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
622
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
623
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
624
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
586
625
 
587
626
 
627
+ > **Note:**
628
+ >
629
+ > As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
Binary file
@@ -0,0 +1,4 @@
1
+ # Sphinx build info version 1
2
+ # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3
+ config: 1a3c03fa317dbf0f46b6f7567774d6c5
4
+ tags: 645f666f9bcd5a90fca523b33c5a78b7