genal-python 0.7__tar.gz → 0.8__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- genal_python-0.8/.DS_Store +0 -0
- {genal_python-0.7 → genal_python-0.8}/.gitignore +1 -2
- genal_python-0.8/.readthedocs.yaml +22 -0
- {genal_python-0.7 → genal_python-0.8}/PKG-INFO +73 -10
- {genal_python-0.7 → genal_python-0.8}/README.md +73 -8
- genal_python-0.8/docs/.DS_Store +0 -0
- genal_python-0.8/docs/build/.buildinfo +4 -0
- genal_python-0.8/docs/build/.doctrees/api.doctree +0 -0
- genal_python-0.8/docs/build/.doctrees/environment.pickle +0 -0
- genal_python-0.8/docs/build/.doctrees/genal.doctree +0 -0
- genal_python-0.8/docs/build/.doctrees/index.doctree +0 -0
- genal_python-0.8/docs/build/.doctrees/introduction.doctree +0 -0
- genal_python-0.8/docs/build/.doctrees/modules.doctree +0 -0
- genal_python-0.8/docs/build/_modules/genal/Geno.html +1480 -0
- genal_python-0.8/docs/build/_modules/genal/MR.html +1065 -0
- genal_python-0.8/docs/build/_modules/genal/MR_tools.html +671 -0
- genal_python-0.8/docs/build/_modules/genal/MRpresso.html +409 -0
- genal_python-0.8/docs/build/_modules/genal/association.html +445 -0
- genal_python-0.8/docs/build/_modules/genal/clump.html +183 -0
- genal_python-0.8/docs/build/_modules/genal/extract_prs.html +426 -0
- genal_python-0.8/docs/build/_modules/genal/geno_tools.html +567 -0
- genal_python-0.8/docs/build/_modules/genal/lift.html +371 -0
- genal_python-0.8/docs/build/_modules/genal/proxy.html +359 -0
- genal_python-0.8/docs/build/_modules/genal/snp_query.html +231 -0
- genal_python-0.8/docs/build/_modules/genal/tools.html +440 -0
- genal_python-0.8/docs/build/_modules/index.html +114 -0
- genal_python-0.8/docs/build/_sources/api.rst.txt +100 -0
- genal_python-0.8/docs/build/_sources/index.rst.txt +69 -0
- genal_python-0.8/docs/build/_sources/introduction.rst.txt +666 -0
- genal_python-0.8/docs/build/_sources/modules.rst.txt +82 -0
- genal_python-0.8/docs/build/_static/basic.css +925 -0
- genal_python-0.8/docs/build/_static/css/badge_only.css +1 -0
- genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.eot +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.svg +2671 -0
- genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.ttf +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-bold-italic.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-bold-italic.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-bold.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-bold.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-normal-italic.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-normal-italic.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-normal.woff +0 -0
- genal_python-0.8/docs/build/_static/css/fonts/lato-normal.woff2 +0 -0
- genal_python-0.8/docs/build/_static/css/theme.css +4 -0
- genal_python-0.8/docs/build/_static/doctools.js +156 -0
- genal_python-0.8/docs/build/_static/documentation_options.js +13 -0
- genal_python-0.8/docs/build/_static/file.png +0 -0
- genal_python-0.8/docs/build/_static/js/badge_only.js +1 -0
- genal_python-0.8/docs/build/_static/js/html5shiv-printshiv.min.js +4 -0
- genal_python-0.8/docs/build/_static/js/html5shiv.min.js +4 -0
- genal_python-0.8/docs/build/_static/js/theme.js +1 -0
- genal_python-0.8/docs/build/_static/language_data.js +199 -0
- genal_python-0.8/docs/build/_static/minus.png +0 -0
- genal_python-0.8/docs/build/_static/plus.png +0 -0
- genal_python-0.8/docs/build/_static/pygments.css +75 -0
- genal_python-0.8/docs/build/_static/searchtools.js +619 -0
- genal_python-0.8/docs/build/_static/sphinx_highlight.js +154 -0
- genal_python-0.8/docs/build/api.html +2251 -0
- genal_python-0.8/docs/build/genal.html +2060 -0
- genal_python-0.8/docs/build/genindex.html +584 -0
- genal_python-0.8/docs/build/index.html +186 -0
- genal_python-0.8/docs/build/introduction.html +706 -0
- genal_python-0.8/docs/build/modules.html +754 -0
- genal_python-0.8/docs/build/objects.inv +0 -0
- genal_python-0.8/docs/build/py-modindex.html +177 -0
- genal_python-0.8/docs/build/search.html +122 -0
- genal_python-0.8/docs/build/searchindex.js +1 -0
- genal_python-0.8/docs/requirements.txt +14 -0
- genal_python-0.8/docs/source/.DS_Store +0 -0
- genal_python-0.8/docs/source/Images/MR_plot_SBP_AS.png +0 -0
- genal_python-0.8/docs/source/api.rst +100 -0
- {genal_python-0.7 → genal_python-0.8}/docs/source/conf.py +3 -2
- {genal_python-0.7 → genal_python-0.8}/docs/source/index.rst +14 -4
- genal_python-0.8/docs/source/introduction.rst +666 -0
- genal_python-0.8/docs/source/modules.rst +82 -0
- {genal_python-0.7 → genal_python-0.8}/genal/Geno.py +72 -49
- {genal_python-0.7 → genal_python-0.8}/genal/MR.py +16 -16
- {genal_python-0.7 → genal_python-0.8}/genal/MR_tools.py +11 -0
- {genal_python-0.7 → genal_python-0.8}/genal/__init__.py +1 -1
- {genal_python-0.7 → genal_python-0.8}/genal/constants.py +1 -0
- {genal_python-0.7 → genal_python-0.8}/genal/extract_prs.py +1 -1
- {genal_python-0.7 → genal_python-0.8}/genal/snp_query.py +53 -17
- {genal_python-0.7 → genal_python-0.8}/genal/tools.py +16 -6
- {genal_python-0.7 → genal_python-0.8}/pyproject.toml +2 -3
- genal_python-0.7/docs/requirements.txt +0 -2
- genal_python-0.7/docs/source/api.rst +0 -24
- genal_python-0.7/docs/source/introduction.rst +0 -505
- genal_python-0.7/docs/source/modules.rst +0 -7
- {genal_python-0.7 → genal_python-0.8}/LICENSE +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/Makefile +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/api.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/environment.pickle +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/genal.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/index.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/introduction.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/modules.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/source/genal.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/doctrees/source/modules.doctree +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/.buildinfo +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/api.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/genal.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/index.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/introduction.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/modules.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/source/genal.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_sources/source/modules.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/_sphinx_javascript_frameworks_compat.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/basic.css +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/badge_only.css +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.eot +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.svg +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.ttf +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold-italic.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold-italic.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-bold.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal-italic.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal-italic.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal.woff +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/fonts/lato-normal.woff2 +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/css/theme.css +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/doctools.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/documentation_options.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/file.png +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/jquery.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/js/badge_only.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/js/html5shiv-printshiv.min.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/js/html5shiv.min.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/js/theme.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/language_data.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/minus.png +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/plus.png +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/pygments.css +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/searchtools.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/_static/sphinx_highlight.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/api.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/genal.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/genindex.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/index.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/introduction.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/modules.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/objects.inv +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/py-modindex.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/search.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/searchindex.js +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/source/genal.html +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/_build/html/source/modules.html +0 -0
- {genal_python-0.7/docs/Images → genal_python-0.8/docs/build/_images}/MR_plot_SBP_AS.png +0 -0
- /genal_python-0.7/docs/source/genal.rst → /genal_python-0.8/docs/build/_sources/genal.rst.txt +0 -0
- {genal_python-0.7 → genal_python-0.8}/docs/make.bat +0 -0
- {genal_python-0.7 → genal_python-0.8}/genal/MRpresso.py +0 -0
- {genal_python-0.7 → genal_python-0.8}/genal/association.py +0 -0
- {genal_python-0.7 → genal_python-0.8}/genal/clump.py +0 -0
- {genal_python-0.7 → genal_python-0.8}/genal/geno_tools.py +0 -0
- {genal_python-0.7 → genal_python-0.8}/genal/lift.py +0 -0
- {genal_python-0.7 → genal_python-0.8}/genal/proxy.py +0 -0
- {genal_python-0.7 → genal_python-0.8}/gitignore +0 -0
- {genal_python-0.7 → genal_python-0.8}/readthedocs.yaml +0 -0
- {genal_python-0.7 → genal_python-0.8}/requirements.txt +0 -0
|
Binary file
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# .readthedocs.yaml
|
|
2
|
+
# Read the Docs configuration file
|
|
3
|
+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
|
4
|
+
|
|
5
|
+
# Required
|
|
6
|
+
version: 2
|
|
7
|
+
|
|
8
|
+
# Set the version of Python and other tools you might need
|
|
9
|
+
build:
|
|
10
|
+
os: ubuntu-22.04
|
|
11
|
+
tools:
|
|
12
|
+
python: "3.11"
|
|
13
|
+
|
|
14
|
+
# Build documentation in the docs/ directory with Sphinx
|
|
15
|
+
sphinx:
|
|
16
|
+
configuration: docs/source/conf.py
|
|
17
|
+
|
|
18
|
+
# We recommend specifying your dependencies to enable reproducible builds:
|
|
19
|
+
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
|
|
20
|
+
python:
|
|
21
|
+
install:
|
|
22
|
+
- requirements: docs/requirements.txt
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: genal-python
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.8
|
|
4
4
|
Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
|
|
5
5
|
Author-email: Cyprien Rivier <riviercyprien@gmail.com>
|
|
6
6
|
Requires-Python: >=3.7
|
|
@@ -17,7 +17,6 @@ Requires-Dist: psutil==5.9.1
|
|
|
17
17
|
Requires-Dist: pyliftover==0.4
|
|
18
18
|
Requires-Dist: scikit_learn>=1.3.0
|
|
19
19
|
Requires-Dist: scipy>=1.11.4
|
|
20
|
-
Requires-Dist: sphinx_rtd_theme==1.3.0
|
|
21
20
|
Requires-Dist: statsmodels==0.14.0
|
|
22
21
|
Requires-Dist: tqdm==4.66.1
|
|
23
22
|
Requires-Dist: wget==3.2
|
|
@@ -32,10 +31,11 @@ Project-URL: Home, https://github.com/CypRiv/genal
|
|
|
32
31
|
|
|
33
32
|
# Table of contents
|
|
34
33
|
1. [Introduction](#introduction)
|
|
35
|
-
2. [Citation]
|
|
34
|
+
2. [Citation](#citation)
|
|
36
35
|
3. [Requirements for the genal module](#paragraph1)
|
|
37
36
|
4. [Installation and how to use genal](#paragraph2)
|
|
38
37
|
1. [Installation](#paragraph2.1)
|
|
38
|
+
2. [Documentation](#paragraph2.2)
|
|
39
39
|
5. [Tutorial and presentation of the main tools](#paragraph3)
|
|
40
40
|
1. [Data loading](#paragraph3.1)
|
|
41
41
|
2. [Data preprocessing](#paragraph3.2)
|
|
@@ -44,6 +44,7 @@ Project-URL: Home, https://github.com/CypRiv/genal
|
|
|
44
44
|
5. [Mendelian Randomization](#paragraph3.5)
|
|
45
45
|
6. [SNP-association testing](#paragraph3.6)
|
|
46
46
|
7. [Lifting](#paragraph3.7)
|
|
47
|
+
8. [GWAS Catalog](#paragraph3.8)
|
|
47
48
|
|
|
48
49
|
|
|
49
50
|
## Introduction <a name="introduction"></a>
|
|
@@ -80,6 +81,16 @@ Once downloaded, the path to the plink executable can be set with:
|
|
|
80
81
|
```
|
|
81
82
|
genal.set_plink(path="/path/to/plink/executable/file")
|
|
82
83
|
```
|
|
84
|
+
### Documentation <a name="paragraph2.2"></a>
|
|
85
|
+
|
|
86
|
+
For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
|
|
87
|
+
|
|
88
|
+
The documentation covers:
|
|
89
|
+
- Installation
|
|
90
|
+
- This tutorial
|
|
91
|
+
- The list of the main functions with complete description of their arguments
|
|
92
|
+
- An exhaustive API reference
|
|
93
|
+
|
|
83
94
|
|
|
84
95
|
## Tutorial <a name="paragraph3"></a>
|
|
85
96
|
For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
|
|
@@ -100,11 +111,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
|
|
|
100
111
|
- Data lifting to another genomic build
|
|
101
112
|
- In pure Python
|
|
102
113
|
- Using LiftOver
|
|
103
|
-
-
|
|
114
|
+
- Querying the GWAS Catalog
|
|
104
115
|
|
|
105
116
|
### Data loading <a name="paragraph3.1"></a>
|
|
106
117
|
|
|
107
|
-
We
|
|
118
|
+
We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
|
|
108
119
|
|
|
109
120
|
```python
|
|
110
121
|
import pandas as pd
|
|
@@ -133,6 +144,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
|
|
|
133
144
|
- **P**: Column name for effect p-value. Defaults to `'P'`.
|
|
134
145
|
- **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
|
|
135
146
|
|
|
147
|
+
> **Note:**
|
|
148
|
+
>
|
|
149
|
+
> You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
|
|
150
|
+
|
|
136
151
|
After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
|
|
137
152
|
|
|
138
153
|
```python
|
|
@@ -158,7 +173,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
|
|
|
158
173
|
|
|
159
174
|
> **Note:**
|
|
160
175
|
>
|
|
161
|
-
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
176
|
+
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
162
177
|
|
|
163
178
|
### Data preprocessing <a name="paragraph3.2"></a>
|
|
164
179
|
|
|
@@ -337,7 +352,7 @@ and the output is:
|
|
|
337
352
|
The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
|
|
338
353
|
PRS data saved to SBP_prs.csv
|
|
339
354
|
|
|
340
|
-
In our case, we have been able to find proxies for
|
|
355
|
+
In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
|
|
341
356
|
|
|
342
357
|
You can customize how the proxies are chosen with the following arguments:
|
|
343
358
|
- `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
|
|
@@ -347,7 +362,7 @@ You can customize how the proxies are chosen with the following arguments:
|
|
|
347
362
|
|
|
348
363
|
> **Note:**
|
|
349
364
|
>
|
|
350
|
-
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of
|
|
365
|
+
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
|
|
351
366
|
|
|
352
367
|
|
|
353
368
|
### Mendelian Randomization <a name="paragraph3.5"></a>
|
|
@@ -462,7 +477,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
|
|
|
462
477
|
SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
|
|
463
478
|
```
|
|
464
479
|
|
|
465
|
-

|
|
466
481
|
You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
|
|
467
482
|
|
|
468
483
|
If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
@@ -507,7 +522,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
|
|
|
507
522
|
|
|
508
523
|
> **Note:**
|
|
509
524
|
>
|
|
510
|
-
>
|
|
525
|
+
> One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
511
526
|
|
|
512
527
|
Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
513
528
|
|
|
@@ -585,6 +600,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
|
|
|
585
600
|
SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
|
|
586
601
|
```
|
|
587
602
|
|
|
603
|
+
### GWAS Catalog <a name="paragraph3.8"></a>
|
|
588
604
|
|
|
605
|
+
It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
|
|
589
606
|
|
|
607
|
+
```python
|
|
608
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
|
|
609
|
+
```
|
|
610
|
+
Which will output:
|
|
611
|
+
|
|
612
|
+
Querying the GWAS Catalog and creating the ASSOC column.
|
|
613
|
+
Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
|
|
614
|
+
To report the p-value for each association, use return_p=True.
|
|
615
|
+
To report the study ID for each association, use return_study=True.
|
|
616
|
+
The .data attribute will be modified. Use replace=False to leave it as is.
|
|
617
|
+
100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
|
|
618
|
+
The ASSOC column has been successfully created.
|
|
619
|
+
701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
|
|
620
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
621
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
622
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
|
|
623
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
624
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
|
|
625
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
626
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
627
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
628
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
|
|
629
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
630
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
|
|
631
|
+
|
|
632
|
+
If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
|
|
590
633
|
|
|
634
|
+
```python
|
|
635
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
|
|
636
|
+
```
|
|
637
|
+
|
|
638
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
639
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
640
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
|
|
641
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
642
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
|
|
643
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
644
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
645
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
646
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
|
|
647
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
648
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
|
|
649
|
+
|
|
650
|
+
|
|
651
|
+
> **Note:**
|
|
652
|
+
>
|
|
653
|
+
> As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
|
|
@@ -7,10 +7,11 @@
|
|
|
7
7
|
|
|
8
8
|
# Table of contents
|
|
9
9
|
1. [Introduction](#introduction)
|
|
10
|
-
2. [Citation]
|
|
10
|
+
2. [Citation](#citation)
|
|
11
11
|
3. [Requirements for the genal module](#paragraph1)
|
|
12
12
|
4. [Installation and how to use genal](#paragraph2)
|
|
13
13
|
1. [Installation](#paragraph2.1)
|
|
14
|
+
2. [Documentation](#paragraph2.2)
|
|
14
15
|
5. [Tutorial and presentation of the main tools](#paragraph3)
|
|
15
16
|
1. [Data loading](#paragraph3.1)
|
|
16
17
|
2. [Data preprocessing](#paragraph3.2)
|
|
@@ -19,6 +20,7 @@
|
|
|
19
20
|
5. [Mendelian Randomization](#paragraph3.5)
|
|
20
21
|
6. [SNP-association testing](#paragraph3.6)
|
|
21
22
|
7. [Lifting](#paragraph3.7)
|
|
23
|
+
8. [GWAS Catalog](#paragraph3.8)
|
|
22
24
|
|
|
23
25
|
|
|
24
26
|
## Introduction <a name="introduction"></a>
|
|
@@ -55,6 +57,16 @@ Once downloaded, the path to the plink executable can be set with:
|
|
|
55
57
|
```
|
|
56
58
|
genal.set_plink(path="/path/to/plink/executable/file")
|
|
57
59
|
```
|
|
60
|
+
### Documentation <a name="paragraph2.2"></a>
|
|
61
|
+
|
|
62
|
+
For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
|
|
63
|
+
|
|
64
|
+
The documentation covers:
|
|
65
|
+
- Installation
|
|
66
|
+
- This tutorial
|
|
67
|
+
- The list of the main functions with complete description of their arguments
|
|
68
|
+
- An exhaustive API reference
|
|
69
|
+
|
|
58
70
|
|
|
59
71
|
## Tutorial <a name="paragraph3"></a>
|
|
60
72
|
For this tutorial, we will obtain genetic instruments for systolic blood pressure (SBP), compute a Polygenic Risk Score (PRS), and run a Mendelian Randomization analysis to investigate the genetically-determined effect of SBP on the risk of stroke. We will utilize summary statistics from Genome-Wide Association Studies (GWAS) and individual-level data from the UK Biobank. The steps include:
|
|
@@ -75,11 +87,11 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
|
|
|
75
87
|
- Data lifting to another genomic build
|
|
76
88
|
- In pure Python
|
|
77
89
|
- Using LiftOver
|
|
78
|
-
-
|
|
90
|
+
- Querying the GWAS Catalog
|
|
79
91
|
|
|
80
92
|
### Data loading <a name="paragraph3.1"></a>
|
|
81
93
|
|
|
82
|
-
We
|
|
94
|
+
We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
|
|
83
95
|
|
|
84
96
|
```python
|
|
85
97
|
import pandas as pd
|
|
@@ -108,6 +120,10 @@ The `genal.Geno` takes as input a pandas dataframe where each row corresponds to
|
|
|
108
120
|
- **P**: Column name for effect p-value. Defaults to `'P'`.
|
|
109
121
|
- **EAF**: Column name for effect allele frequency. Defaults to `'EAF'`.
|
|
110
122
|
|
|
123
|
+
> **Note:**
|
|
124
|
+
>
|
|
125
|
+
> You do not need all columns to move forward, as not all columns are required by every function. Additionally, some columns can be imputed as we will see in the next paragraph.
|
|
126
|
+
|
|
111
127
|
After inspecting the dataframe, we first need to extract the chromosome and position information from the `MarkerName` column into two new columns `CHR` and `POS`:
|
|
112
128
|
|
|
113
129
|
```python
|
|
@@ -133,7 +149,7 @@ The last argument (`keep_columns = False`) indicates that we do not wish to keep
|
|
|
133
149
|
|
|
134
150
|
> **Note:**
|
|
135
151
|
>
|
|
136
|
-
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
152
|
+
> Make sure to read the readme file usually provided with the summary statistics to identify the correct columns. It is particularly important to correctly identify the allele that represents the effect allele.
|
|
137
153
|
|
|
138
154
|
### Data preprocessing <a name="paragraph3.2"></a>
|
|
139
155
|
|
|
@@ -312,7 +328,7 @@ and the output is:
|
|
|
312
328
|
The PRS computation was successful and used 1330/1538 (86.476%) SNPs.
|
|
313
329
|
PRS data saved to SBP_prs.csv
|
|
314
330
|
|
|
315
|
-
In our case, we have been able to find proxies for
|
|
331
|
+
In our case, we have been able to find proxies for 578 of the 786 SNPs that were missing in the population genetic data (7 potential proxies have been removed because they were identical to SNPs already present in our data).
|
|
316
332
|
|
|
317
333
|
You can customize how the proxies are chosen with the following arguments:
|
|
318
334
|
- `reference_panel`: The reference population used to derive linkage disequilibrium values and find proxies. Defaults to `eur`.
|
|
@@ -322,7 +338,7 @@ You can customize how the proxies are chosen with the following arguments:
|
|
|
322
338
|
|
|
323
339
|
> **Note:**
|
|
324
340
|
>
|
|
325
|
-
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of
|
|
341
|
+
> You can call the `genal.Geno.prs` method on any `Geno` instance (containing at least the EA, BETA, and either SNP or CHR/POS columns). The data does not need to be clumped, and there is no limit to the number of SNPs used to compute the scores.
|
|
326
342
|
|
|
327
343
|
|
|
328
344
|
### Mendelian Randomization <a name="paragraph3.5"></a>
|
|
@@ -437,7 +453,7 @@ If you want to visualize the obtained MR results, you can use the `genal.Geno.MR
|
|
|
437
453
|
SBP_clumped.MR_plot(filename="MR_plot_SBP_AS")
|
|
438
454
|
```
|
|
439
455
|
|
|
440
|
-

|
|
441
457
|
You can select which MR methods you wish to plot with the `methods` argument. Note that for an MR method to be plotted, they must be included in the latest `genal.Geno.MR` call of this `genal.Geno` instance.
|
|
442
458
|
|
|
443
459
|
If you wish to include the heterogeneity values (Cochran's Q) in the results, you can use the heterogeneity argument in the `genal.Geno.MR` call. Here, the heterogeneity for the inverse-variance weighted method:
|
|
@@ -482,7 +498,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
|
|
|
482
498
|
|
|
483
499
|
> **Note:**
|
|
484
500
|
>
|
|
485
|
-
>
|
|
501
|
+
> One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
|
|
486
502
|
|
|
487
503
|
Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
|
|
488
504
|
|
|
@@ -560,5 +576,54 @@ You can specify the path of the LiftOver executable to the `liftover_path` argum
|
|
|
560
576
|
SBP_Geno.lift(start = "hg19", end = "hg38", replace = False, liftover_path = "path/to/liftover/exec")
|
|
561
577
|
```
|
|
562
578
|
|
|
579
|
+
### GWAS Catalog <a name="paragraph3.8"></a>
|
|
580
|
+
|
|
581
|
+
It is sometimes interesting to determine the traits associated with our SNPs. In Mendelian Randomization, for instance, we may want to exclude instruments that are associated with traits likely causing horizontal pleiotropy. For this purpose, we can use the `genal.Geno.query_gwas_catalog` method. This method will query the GWAS Catalog API to determine the list of traits associated with each of our SNPs and store the results in a list in the `ASSOC` column of the `.data` attribute:
|
|
582
|
+
|
|
583
|
+
```python
|
|
584
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8)
|
|
585
|
+
```
|
|
586
|
+
Which will output:
|
|
587
|
+
|
|
588
|
+
Querying the GWAS Catalog and creating the ASSOC column.
|
|
589
|
+
Only associations with a p-value <= 5e-08 are reported. Use the p_threshold argument to change the threshold.
|
|
590
|
+
To report the p-value for each association, use return_p=True.
|
|
591
|
+
To report the study ID for each association, use return_study=True.
|
|
592
|
+
The .data attribute will be modified. Use replace=False to leave it as is.
|
|
593
|
+
100%|██████████| 1545/1545 [00:34<00:00, 44.86it/s]
|
|
594
|
+
The ASSOC column has been successfully created.
|
|
595
|
+
701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
|
|
596
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
597
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
598
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | [eicosanoids measurement, decadienedioic acid (...] |
|
|
599
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
600
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [diastolic blood pressure, systolic blood pressure...] |
|
|
601
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
602
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
603
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
604
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [diastolic blood pressure, systolic blood pressure...] |
|
|
605
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
606
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [protein measurement, pulse pressure measurement...] |
|
|
607
|
+
|
|
608
|
+
If you are also interested in the p-values of each SNP-trait association, or the ID of the study from which the association was reported, you can use the `return_p = True` and `return_study = True` arguments. Then, the `ASSOC` column will contain a list of tuples, where each tuple contains the trait name, the p-value, and the study ID:
|
|
609
|
+
|
|
610
|
+
```python
|
|
611
|
+
SBP_clumped.query_gwas_catalog(p_threshold=5e-8, return_p=True, return_study=True)
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
| EA | NEA | EAF | BETA | SE | CHR | POS | SNP | ASSOC |
|
|
615
|
+
|-----|-----|-------|--------|--------|-----|------------|------------|------------------------------------------------------------------------|
|
|
616
|
+
| A | G | 0.1784| 0.2330 | 0.0402 | 10 | 102075479 | rs603424 | TIMEOUT |
|
|
617
|
+
| A | G | 0.0706| -0.3873| 0.0626 | 10 | 102403682 | rs2996303 | FAILED_QUERY |
|
|
618
|
+
| T | G | 0.8872| 0.6846 | 0.0480 | 10 | 102553647 | rs1006545 | [(heart rate response to exercise, 6e-12, GCST... |
|
|
619
|
+
| T | G | 0.6652| -0.2098| 0.0340 | 10 | 102558506 | rs12570050 | FAILED_QUERY |
|
|
620
|
+
| T | C | 0.3057| -0.2448| 0.0334 | 10 | 102603924 | rs4919502 | FAILED_QUERY |
|
|
621
|
+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | |
|
|
622
|
+
| T | C | 0.3514| 0.2203 | 0.0314 | 9 | 9350706 | rs1332813 | [(diastolic blood pressure, 1e-12, GCST9031029... |
|
|
623
|
+
| T | C | 0.6880| -0.1897| 0.0332 | 9 | 94201341 | rs10820855 | FAILED_QUERY |
|
|
624
|
+
| A | T | 0.3669| -0.1862| 0.0313 | 9 | 95201540 | rs7045409 | [(systolic blood pressure, 9e-13, GCST006624),... |
|
|
563
625
|
|
|
564
626
|
|
|
627
|
+
> **Note:**
|
|
628
|
+
>
|
|
629
|
+
> As you can see, many SNPs failed to be queried. This is normal as the GWAS Catalog is not exhaustive.
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|