genal-python 0.7__tar.gz → 0.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (128) hide show
  1. genal_python-0.9/.DS_Store +0 -0
  2. {genal_python-0.7 → genal_python-0.9}/.gitignore +1 -1
  3. genal_python-0.9/.readthedocs.yaml +22 -0
  4. {genal_python-0.7 → genal_python-0.9}/PKG-INFO +105 -29
  5. {genal_python-0.7 → genal_python-0.9}/README.md +105 -27
  6. genal_python-0.9/docs/.DS_Store +0 -0
  7. genal_python-0.9/docs/build/.DS_Store +0 -0
  8. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/.buildinfo +1 -1
  9. genal_python-0.9/docs/build/.doctrees/api.doctree +0 -0
  10. genal_python-0.9/docs/build/.doctrees/environment.pickle +0 -0
  11. genal_python-0.9/docs/build/.doctrees/genal.doctree +0 -0
  12. genal_python-0.9/docs/build/.doctrees/index.doctree +0 -0
  13. genal_python-0.9/docs/build/.doctrees/introduction.doctree +0 -0
  14. genal_python-0.9/docs/build/.doctrees/modules.doctree +0 -0
  15. genal_python-0.9/docs/build/_modules/genal/Geno.html +1480 -0
  16. genal_python-0.9/docs/build/_modules/genal/MR.html +1065 -0
  17. genal_python-0.9/docs/build/_modules/genal/MR_tools.html +671 -0
  18. genal_python-0.9/docs/build/_modules/genal/MRpresso.html +409 -0
  19. genal_python-0.9/docs/build/_modules/genal/association.html +445 -0
  20. genal_python-0.9/docs/build/_modules/genal/clump.html +183 -0
  21. genal_python-0.9/docs/build/_modules/genal/extract_prs.html +426 -0
  22. genal_python-0.9/docs/build/_modules/genal/geno_tools.html +567 -0
  23. genal_python-0.9/docs/build/_modules/genal/lift.html +371 -0
  24. genal_python-0.9/docs/build/_modules/genal/proxy.html +359 -0
  25. genal_python-0.9/docs/build/_modules/genal/snp_query.html +231 -0
  26. genal_python-0.9/docs/build/_modules/genal/tools.html +440 -0
  27. genal_python-0.9/docs/build/_modules/index.html +114 -0
  28. genal_python-0.7/docs/_build/html/_sources/source/genal.rst.txt → genal_python-0.9/docs/build/_sources/api.rst.txt +39 -40
  29. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_sources/index.rst.txt +15 -5
  30. genal_python-0.9/docs/build/_sources/introduction.rst.txt +674 -0
  31. genal_python-0.9/docs/build/_sources/modules.rst.txt +82 -0
  32. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/basic.css +1 -1
  33. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/doctools.js +1 -1
  34. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/documentation_options.js +1 -1
  35. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/language_data.js +2 -2
  36. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/searchtools.js +105 -60
  37. genal_python-0.7/docs/_build/html/source/genal.html → genal_python-0.9/docs/build/api.html +1310 -1119
  38. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/genal.html +191 -170
  39. genal_python-0.9/docs/build/genindex.html +584 -0
  40. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/index.html +36 -42
  41. genal_python-0.9/docs/build/introduction.html +714 -0
  42. genal_python-0.7/docs/_build/html/api.html → genal_python-0.9/docs/build/modules.html +299 -421
  43. genal_python-0.9/docs/build/objects.inv +0 -0
  44. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/py-modindex.html +20 -20
  45. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/search.html +8 -8
  46. genal_python-0.9/docs/build/searchindex.js +1 -0
  47. genal_python-0.9/docs/source/.DS_Store +0 -0
  48. genal_python-0.9/docs/source/Images/MR_plot_SBP_AS.png +0 -0
  49. genal_python-0.7/docs/source/genal.rst → genal_python-0.9/docs/source/api.rst +39 -40
  50. {genal_python-0.7 → genal_python-0.9}/docs/source/conf.py +3 -2
  51. {genal_python-0.7 → genal_python-0.9}/docs/source/index.rst +15 -5
  52. genal_python-0.9/docs/source/introduction.rst +677 -0
  53. genal_python-0.9/docs/source/modules.rst +82 -0
  54. {genal_python-0.7 → genal_python-0.9}/genal/Geno.py +73 -50
  55. {genal_python-0.7 → genal_python-0.9}/genal/MR.py +16 -16
  56. {genal_python-0.7 → genal_python-0.9}/genal/MR_tools.py +11 -0
  57. {genal_python-0.7 → genal_python-0.9}/genal/__init__.py +1 -1
  58. {genal_python-0.7 → genal_python-0.9}/genal/constants.py +1 -0
  59. {genal_python-0.7 → genal_python-0.9}/genal/extract_prs.py +34 -12
  60. {genal_python-0.7 → genal_python-0.9}/genal/geno_tools.py +2 -2
  61. {genal_python-0.7 → genal_python-0.9}/genal/snp_query.py +53 -17
  62. {genal_python-0.7 → genal_python-0.9}/genal/tools.py +16 -6
  63. {genal_python-0.7 → genal_python-0.9}/pyproject.toml +2 -3
  64. genal_python-0.7/docs/_build/doctrees/api.doctree +0 -0
  65. genal_python-0.7/docs/_build/doctrees/environment.pickle +0 -0
  66. genal_python-0.7/docs/_build/doctrees/genal.doctree +0 -0
  67. genal_python-0.7/docs/_build/doctrees/index.doctree +0 -0
  68. genal_python-0.7/docs/_build/doctrees/introduction.doctree +0 -0
  69. genal_python-0.7/docs/_build/doctrees/modules.doctree +0 -0
  70. genal_python-0.7/docs/_build/doctrees/source/genal.doctree +0 -0
  71. genal_python-0.7/docs/_build/doctrees/source/modules.doctree +0 -0
  72. genal_python-0.7/docs/_build/html/_sources/api.rst.txt +0 -24
  73. genal_python-0.7/docs/_build/html/_sources/introduction.rst.txt +0 -505
  74. genal_python-0.7/docs/_build/html/_sources/modules.rst.txt +0 -7
  75. genal_python-0.7/docs/_build/html/_sources/source/modules.rst.txt +0 -7
  76. genal_python-0.7/docs/_build/html/_static/_sphinx_javascript_frameworks_compat.js +0 -123
  77. genal_python-0.7/docs/_build/html/_static/jquery.js +0 -2
  78. genal_python-0.7/docs/_build/html/genindex.html +0 -701
  79. genal_python-0.7/docs/_build/html/introduction.html +0 -584
  80. genal_python-0.7/docs/_build/html/modules.html +0 -269
  81. genal_python-0.7/docs/_build/html/objects.inv +0 -0
  82. genal_python-0.7/docs/_build/html/searchindex.js +0 -1
  83. genal_python-0.7/docs/_build/html/source/modules.html +0 -259
  84. genal_python-0.7/docs/requirements.txt +0 -2
  85. genal_python-0.7/docs/source/api.rst +0 -24
  86. genal_python-0.7/docs/source/introduction.rst +0 -505
  87. genal_python-0.7/docs/source/modules.rst +0 -7
  88. {genal_python-0.7 → genal_python-0.9}/LICENSE +0 -0
  89. {genal_python-0.7 → genal_python-0.9}/docs/Makefile +0 -0
  90. {genal_python-0.7/docs/Images → genal_python-0.9/docs/build/_images}/MR_plot_SBP_AS.png +0 -0
  91. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_sources/genal.rst.txt +0 -0
  92. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/badge_only.css +0 -0
  93. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
  94. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
  95. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
  96. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
  97. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.eot +0 -0
  98. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.svg +0 -0
  99. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.ttf +0 -0
  100. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.woff +0 -0
  101. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
  102. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold-italic.woff +0 -0
  103. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold-italic.woff2 +0 -0
  104. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold.woff +0 -0
  105. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-bold.woff2 +0 -0
  106. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal-italic.woff +0 -0
  107. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal-italic.woff2 +0 -0
  108. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal.woff +0 -0
  109. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/fonts/lato-normal.woff2 +0 -0
  110. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/css/theme.css +0 -0
  111. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/file.png +0 -0
  112. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/badge_only.js +0 -0
  113. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/html5shiv-printshiv.min.js +0 -0
  114. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/html5shiv.min.js +0 -0
  115. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/js/theme.js +0 -0
  116. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/minus.png +0 -0
  117. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/plus.png +0 -0
  118. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/pygments.css +0 -0
  119. {genal_python-0.7/docs/_build/html → genal_python-0.9/docs/build}/_static/sphinx_highlight.js +0 -0
  120. {genal_python-0.7 → genal_python-0.9}/docs/make.bat +0 -0
  121. {genal_python-0.7 → genal_python-0.9}/genal/MRpresso.py +0 -0
  122. {genal_python-0.7 → genal_python-0.9}/genal/association.py +0 -0
  123. {genal_python-0.7 → genal_python-0.9}/genal/clump.py +0 -0
  124. {genal_python-0.7 → genal_python-0.9}/genal/lift.py +0 -0
  125. {genal_python-0.7 → genal_python-0.9}/genal/proxy.py +0 -0
  126. {genal_python-0.7 → genal_python-0.9}/gitignore +0 -0
  127. {genal_python-0.7 → genal_python-0.9}/readthedocs.yaml +0 -0
  128. {genal_python-0.7 → genal_python-0.9}/requirements.txt +0 -0
Binary file
@@ -3,4 +3,4 @@ dist/
3
3
  .ipynb_checkpoints/
4
4
  ipynb_checkpoints/
5
5
  genal/.ipynb_checkpoints/
6
- docs/
6
+ test_data/
@@ -0,0 +1,22 @@
1
+ # .readthedocs.yaml
2
+ # Read the Docs configuration file
3
+ # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4
+
5
+ # Required
6
+ version: 2
7
+
8
+ # Set the version of Python and other tools you might need
9
+ build:
10
+ os: ubuntu-22.04
11
+ tools:
12
+ python: "3.11"
13
+
14
+ # Build documentation in the docs/ directory with Sphinx
15
+ sphinx:
16
+ configuration: docs/source/conf.py
17
+
18
+ # We recommend specifying your dependencies to enable reproducible builds:
19
+ # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
20
+ python:
21
+ install:
22
+ - requirements: docs/requirements.txt
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: genal-python
3
- Version: 0.7
3
+ Version: 0.9
4
4
  Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
5
5
  Author-email: Cyprien Rivier <riviercyprien@gmail.com>
6
6
  Requires-Python: >=3.7
@@ -17,7 +17,6 @@ Requires-Dist: psutil==5.9.1
17
17
  Requires-Dist: pyliftover==0.4
18
18
  Requires-Dist: scikit_learn>=1.3.0
19
19
  Requires-Dist: scipy>=1.11.4
20
- Requires-Dist: sphinx_rtd_theme==1.3.0
21
20
  Requires-Dist: statsmodels==0.14.0
22
21
  Requires-Dist: tqdm==4.66.1
23
22
  Requires-Dist: wget==3.2
@@ -32,10 +31,11 @@ Project-URL: Home, https://github.com/CypRiv/genal
32
31
 
33
32
  # Table of contents
34
33
  1. [Introduction](#introduction)
35
- 2. [Citation] (#citation)
34
+ 2. [Citation](#citation)
36
35
  3. [Requirements for the genal module](#paragraph1)
37
36
  4. [Installation and how to use genal](#paragraph2)
38
37
  1. [Installation](#paragraph2.1)
38
+ 2. [Documentation](#paragraph2.2)
39
39
  5. [Tutorial and presentation of the main tools](#paragraph3)
40
40
  1. [Data loading](#paragraph3.1)
41
41
  2. [Data preprocessing](#paragraph3.2)
@@ -44,6 +44,7 @@ Project-URL: Home, https://github.com/CypRiv/genal
44
44
  5. [Mendelian Randomization](#paragraph3.5)
45
45
  6. [SNP-association testing](#paragraph3.6)
46
46
  7. [Lifting](#paragraph3.7)
47
+ 8. [GWAS Catalog](#paragraph3.8)
47
48
 
48
49
 
49
50
  ## Introduction <a name="introduction"></a>
@@ -58,18 +59,27 @@ If you're using genal, please cite the following paper:
58
59
  **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
59
60
 
60
61
  ## Requirements for the genal module <a name="paragraph1"></a>
61
- ***Python 3.9 or later***. https://www.python.org/ <br>
62
+ ***Python 3.11 or later***. https://www.python.org/ <br>
62
63
 
63
64
 
64
65
  ## Installation and How to use the genal module <a name="paragraph2"></a>
65
66
 
66
67
  ### Installation <a name="paragraph2.1"></a>
67
68
 
69
+ > **Note:**
70
+ >
71
+ > **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
72
+ > ```
73
+ > conda create --name genal_env python=3.11
74
+ > conda activate genal_env
75
+ > ```
76
+
77
+
68
78
  Download and install the package with pip:
69
79
  ```
70
80
  pip install genal-python
71
81
  ```
72
- And it can be imported in a python environment with:
82
+ And import it in a python environment with:
73
83
  ```python
74
84
  import genal
75
85
  ```
@@ -80,6 +90,16 @@ Once downloaded, the path to the plink executable can be set with:
80
90
  ```
81
91
  genal.set_plink(path="/path/to/plink/executable/file")
82
92
  ```
93
+ ### Documentation <a name="paragraph2.2"></a>
94
+
95
+ For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
96
+
97
+ The documentation covers:
98
+ - Installation
99
+ - This tutorial
100
+ - The list of the main functions with complete description of their arguments
101
+ - An exhaustive API reference
102
+
83
103
 
84
104
  ## Tutorial <a name="paragraph3"></a>
85
105
  For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
@@ -100,11 +120,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
100
120
  - Data lifting to another genomic build
101
121
  - In pure Python
102
122
  - Using LiftOver
103
- - Phenoscanner (to be added)
123
+ - Querying the GWAS Catalog
104
124
 
105
125
  ### Data loading <a name="paragraph3.1"></a>
106
126
 
107
- We begin with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
127
+ We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
108
128
 
109
129
  ```python
110
130
  import pandas as pd
@@ -133,6 +153,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
133
153
  - **P**: Column name for effect p-value. Defaults to `'P'`.
134
154
  - **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
135
155
 
156
+ > **Note:**
157
+ >
158
+ > You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
159
+
136
160
  After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
137
161
 
138
162
  ```python
@@ -158,7 +182,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
158
182
 
159
183
  > **Note:**
160
184
  >
161
- > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele. Also, you do not need all columns to move forward, as some can be inputted as we will see next.
185
+ > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
162
186
 
163
187
  ### Data preprocessing <a name="paragraph3.2"></a>
164
188
 
@@ -222,7 +246,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
222
246
 
223
247
  ### Clumping <a name="paragraph3.3"></a>
224
248
 
225
- Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
249
+ Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
226
250
 
227
251
  The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
228
252
 
@@ -294,7 +318,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
294
318
  Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
295
319
 
296
320
  ```python
297
- SBP_clumped.prs(name = "SBP_prs" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
321
+ SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
298
322
  ```
299
323
 
300
324
  and the output is:
@@ -337,7 +361,7 @@ and the output is:
337
361
  The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
338
362
  PRS data saved to SBP_prs.csv
339
363
 
340
- In our case, we have been able to find proxies for 571 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
364
+ In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
341
365
 
342
366
  You can customize how the proxies are chosen with the following arguments:
343
367
  - `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
@@ -347,7 +371,7 @@ You can customize how the proxies are chosen with the following arguments:
347
371
 
348
372
  > **Note:**
349
373
  >
350
- > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of instruments used to compute the scores.
374
+ > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
351
375
 
352
376
 
353
377
  ### Mendelian Randomization <a name="paragraph3.5"></a>
@@ -395,21 +419,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
395
419
  1541 SNPs out of 1545 are present in the outcome data.
396
420
  (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
397
421
 
398
- Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
399
-
400
- ```python
401
- SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
402
- ```
403
-
404
- And genal will print the number of missing instruments which have been proxied:
405
422
 
406
- Outcome data successfully loaded from 'b352e412' geno instance.
407
- Identifying the exposure SNPs present in the outcome data...
408
- 1541 SNPs out of 1545 are present in the outcome data.
409
- Searching proxies for 4 SNPs...
410
- Using the EUR reference panel.
411
- Found proxies for 4 SNPs.
412
- (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
423
+ > **Note:**
424
+ >Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
425
+ >
426
+ > Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
427
+ >
428
+ > ```python
429
+ > SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
430
+ > ```
431
+ >
432
+ > And genal will print the number of missing instruments that have been proxied:
433
+ >
434
+ > Outcome data successfully loaded from 'b352e412' geno instance.
435
+ > Identifying the exposure SNPs present in the outcome data...
436
+ > 1541 SNPs out of 1545 are present in the outcome data.
437
+ > Searching proxies for 4 SNPs...
438
+ > Using the EUR reference panel.
439
+ > Found proxies for 4 SNPs.
440
+ > (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
413
441
 
414
442
  After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
415
443
 
@@ -454,7 +482,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
454
482
  - `Weighted-mode` for the Weighted mode method
455
483
  - `all` to run all the above methods
456
484
 
457
- For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
485
+ For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
458
486
 
459
487
  If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
460
488
 
@@ -462,7 +490,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
462
490
  SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
463
491
  ```
464
492
 
465
- ![MR plot](docs/Images/MR_plot_SBP_AS.png)
493
+ ![MR plot](docs/build/_images/MR_plot_SBP_AS.png)
466
494
  You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
467
495
 
468
496
  If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
@@ -507,7 +535,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
507
535
 
508
536
  > **Note:**
509
537
  >
510
- > One important detail is to make sure that the individual IDs are identical between the phenotypic data and the genetic data for the target population.
538
+ > One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
511
539
 
512
540
  Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
513
541
 
@@ -585,6 +613,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
585
613
  SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
586
614
  ```
587
615
 
616
+ ### GWAS Catalog <a name="paragraph3.8"></a>
588
617
 
618
+ It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
589
619
 
620
+ ```python
621
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
622
+ ```
623
+ Which will output:
624
+
625
+ Querying the GWAS Catalog and creating the ASSOC column.
626
+ Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
627
+ To report the p-value for each association, use return_p=True.
628
+ To report the study ID for each association, use return_study=True.
629
+ The .data attribute will be modified. Use replace=False to leave it as is.
630
+ 100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
631
+ The ASSOC column has been successfully created.
632
+ 701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
633
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
634
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
635
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
636
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
637
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
638
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
639
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
640
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
641
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
642
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
643
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
644
+
645
+ If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
590
646
 
647
+ ```python
648
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
649
+ ```
650
+
651
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
652
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
653
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
654
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
655
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
656
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
657
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
658
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
659
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
660
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
661
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
662
+
663
+
664
+ > **Note:**
665
+ >
666
+ > As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
@@ -7,10 +7,11 @@
7
7
 
8
8
  # Table of contents
9
9
  1. [Introduction](#introduction)
10
- 2. [Citation] (#citation)
10
+ 2. [Citation](#citation)
11
11
  3. [Requirements for the genal module](#paragraph1)
12
12
  4. [Installation and how to use genal](#paragraph2)
13
13
  1. [Installation](#paragraph2.1)
14
+ 2. [Documentation](#paragraph2.2)
14
15
  5. [Tutorial and presentation of the main tools](#paragraph3)
15
16
  1. [Data loading](#paragraph3.1)
16
17
  2. [Data preprocessing](#paragraph3.2)
@@ -19,6 +20,7 @@
19
20
  5. [Mendelian Randomization](#paragraph3.5)
20
21
  6. [SNP-association testing](#paragraph3.6)
21
22
  7. [Lifting](#paragraph3.7)
23
+ 8. [GWAS Catalog](#paragraph3.8)
22
24
 
23
25
 
24
26
  ## Introduction <a name="introduction"></a>
@@ -33,18 +35,27 @@ If you're using genal, please cite the following paper:
33
35
  **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
34
36
 
35
37
  ## Requirements for the genal module <a name="paragraph1"></a>
36
- ***Python 3.9 or later***. https://www.python.org/ <br>
38
+ ***Python 3.11 or later***. https://www.python.org/ <br>
37
39
 
38
40
 
39
41
  ## Installation and How to use the genal module <a name="paragraph2"></a>
40
42
 
41
43
  ### Installation <a name="paragraph2.1"></a>
42
44
 
45
+ > **Note:**
46
+ >
47
+ > **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
48
+ > ```
49
+ > conda create --name genal_env python=3.11
50
+ > conda activate genal_env
51
+ > ```
52
+
53
+
43
54
  Download and install the package with pip:
44
55
  ```
45
56
  pip install genal-python
46
57
  ```
47
- And it can be imported in a python environment with:
58
+ And import it in a python environment with:
48
59
  ```python
49
60
  import genal
50
61
  ```
@@ -55,6 +66,16 @@ Once downloaded, the path to the plink executable can be set with:
55
66
  ```
56
67
  genal.set_plink(path="/path/to/plink/executable/file")
57
68
  ```
69
+ ### Documentation <a name="paragraph2.2"></a>
70
+
71
+ For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
72
+
73
+ The documentation covers:
74
+ - Installation
75
+ - This tutorial
76
+ - The list of the main functions with complete description of their arguments
77
+ - An exhaustive API reference
78
+
58
79
 
59
80
  ## Tutorial <a name="paragraph3"></a>
60
81
  For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
@@ -75,11 +96,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
75
96
  - Data lifting to another genomic build
76
97
  - In pure Python
77
98
  - Using LiftOver
78
- - Phenoscanner (to be added)
99
+ - Querying the GWAS Catalog
79
100
 
80
101
  ### Data loading <a name="paragraph3.1"></a>
81
102
 
82
- We begin with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
103
+ We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
83
104
 
84
105
  ```python
85
106
  import pandas as pd
@@ -108,6 +129,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
108
129
  - **P**: Column name for effect p-value. Defaults to `'P'`.
109
130
  - **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
110
131
 
132
+ > **Note:**
133
+ >
134
+ > You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
135
+
111
136
  After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
112
137
 
113
138
  ```python
@@ -133,7 +158,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
133
158
 
134
159
  > **Note:**
135
160
  >
136
- > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele. Also, you do not need all columns to move forward, as some can be inputted as we will see next.
161
+ > Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
137
162
 
138
163
  ### Data preprocessing <a name="paragraph3.2"></a>
139
164
 
@@ -197,7 +222,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
197
222
 
198
223
  ### Clumping <a name="paragraph3.3"></a>
199
224
 
200
- Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
225
+ Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
201
226
 
202
227
  The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
203
228
 
@@ -269,7 +294,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
269
294
  Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
270
295
 
271
296
  ```python
272
- SBP_clumped.prs(name = "SBP_prs" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
297
+ SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
273
298
  ```
274
299
 
275
300
  and the output is:
@@ -312,7 +337,7 @@ and the output is:
312
337
  The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
313
338
  PRS data saved to SBP_prs.csv
314
339
 
315
- In our case, we have been able to find proxies for 571 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
340
+ In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
316
341
 
317
342
  You can customize how the proxies are chosen with the following arguments:
318
343
  - `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
@@ -322,7 +347,7 @@ You can customize how the proxies are chosen with the following arguments:
322
347
 
323
348
  > **Note:**
324
349
  >
325
- > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of instruments used to compute the scores.
350
+ > You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
326
351
 
327
352
 
328
353
  ### Mendelian Randomization <a name="paragraph3.5"></a>
@@ -370,21 +395,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
370
395
  1541 SNPs out of 1545 are present in the outcome data.
371
396
  (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
372
397
 
373
- Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
374
-
375
- ```python
376
- SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
377
- ```
378
-
379
- And genal will print the number of missing instruments which have been proxied:
380
398
 
381
- Outcome data successfully loaded from 'b352e412' geno instance.
382
- Identifying the exposure SNPs present in the outcome data...
383
- 1541 SNPs out of 1545 are present in the outcome data.
384
- Searching proxies for 4 SNPs...
385
- Using the EUR reference panel.
386
- Found proxies for 4 SNPs.
387
- (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
399
+ > **Note:**
400
+ >Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
401
+ >
402
+ > Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
403
+ >
404
+ > ```python
405
+ > SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
406
+ > ```
407
+ >
408
+ > And genal will print the number of missing instruments that have been proxied:
409
+ >
410
+ > Outcome data successfully loaded from 'b352e412' geno instance.
411
+ > Identifying the exposure SNPs present in the outcome data...
412
+ > 1541 SNPs out of 1545 are present in the outcome data.
413
+ > Searching proxies for 4 SNPs...
414
+ > Using the EUR reference panel.
415
+ > Found proxies for 4 SNPs.
416
+ > (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
388
417
 
389
418
  After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
390
419
 
@@ -429,7 +458,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
429
458
  - `Weighted-mode` for the Weighted mode method
430
459
  - `all` to run all the above methods
431
460
 
432
- For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
461
+ For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
433
462
 
434
463
  If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
435
464
 
@@ -437,7 +466,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
437
466
  SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
438
467
  ```
439
468
 
440
- ![MR plot](docs/Images/MR_plot_SBP_AS.png)
469
+ ![MR plot](docs/build/_images/MR_plot_SBP_AS.png)
441
470
  You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
442
471
 
443
472
  If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
@@ -482,7 +511,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
482
511
 
483
512
  > **Note:**
484
513
  >
485
- > One important detail is to make sure that the individual IDs are identical between the phenotypic data and the genetic data for the target population.
514
+ > One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
486
515
 
487
516
  Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
488
517
 
@@ -560,5 +589,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
560
589
  SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
561
590
  ```
562
591
 
592
+ ### GWAS Catalog <a name="paragraph3.8"></a>
593
+
594
+ It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
595
+
596
+ ```python
597
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
598
+ ```
599
+ Which will output:
600
+
601
+ Querying the GWAS Catalog and creating the ASSOC column.
602
+ Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
603
+ To report the p-value for each association, use return_p=True.
604
+ To report the study ID for each association, use return_study=True.
605
+ The .data attribute will be modified. Use replace=False to leave it as is.
606
+ 100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
607
+ The ASSOC column has been successfully created.
608
+ 701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
609
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
610
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
611
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
612
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
613
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
614
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
615
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
616
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
617
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
618
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
619
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
620
+
621
+ If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
622
+
623
+ ```python
624
+ SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
625
+ ```
626
+
627
+ | EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
628
+ |-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
629
+ | A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
630
+ | A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
631
+ | T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
632
+ | T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
633
+ | T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
634
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
635
+ | T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
636
+ | T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
637
+ | A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
563
638
 
564
639
 
640
+ > **Note:**
641
+ >
642
+ > As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
Binary file
Binary file
@@ -1,4 +1,4 @@
1
1
  # Sphinx build info version 1
2
2
  # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3
- config: f327128ef63678afda2847b68c505d0e
3
+ config: 1a3c03fa317dbf0f46b6f7567774d6c5
4
4
  tags: 645f666f9bcd5a90fca523b33c5a78b7