genal-python 0.8__tar.gz → 1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (173) hide show
  1. genal_python-1.0/.DS_Store +0 -0
  2. {genal_python-0.8 → genal_python-1.0}/.gitignore +2 -1
  3. genal_python-1.0/Genal_flowchart.png +0 -0
  4. {genal_python-0.8 → genal_python-1.0}/PKG-INFO +55 -28
  5. {genal_python-0.8 → genal_python-1.0}/README.md +51 -24
  6. {genal_python-0.8 → genal_python-1.0}/docs/.DS_Store +0 -0
  7. {genal_python-0.8 → genal_python-1.0/docs/build}/.DS_Store +0 -0
  8. {genal_python-0.8 → genal_python-1.0}/docs/build/.doctrees/api.doctree +0 -0
  9. genal_python-1.0/docs/build/.doctrees/environment.pickle +0 -0
  10. {genal_python-0.8 → genal_python-1.0}/docs/build/.doctrees/index.doctree +0 -0
  11. genal_python-1.0/docs/build/.doctrees/introduction.doctree +0 -0
  12. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/Geno.html +3 -3
  13. {genal_python-0.8 → genal_python-1.0}/docs/build/_sources/index.rst.txt +1 -1
  14. {genal_python-0.8 → genal_python-1.0}/docs/build/_sources/introduction.rst.txt +16 -8
  15. {genal_python-0.8 → genal_python-1.0}/docs/build/api.html +1 -1
  16. {genal_python-0.8 → genal_python-1.0}/docs/build/index.html +1 -1
  17. {genal_python-0.8 → genal_python-1.0}/docs/build/introduction.html +19 -11
  18. genal_python-1.0/docs/build/searchindex.js +1 -0
  19. {genal_python-0.8 → genal_python-1.0}/docs/source/conf.py +1 -1
  20. {genal_python-0.8 → genal_python-1.0}/docs/source/index.rst +4 -4
  21. {genal_python-0.8 → genal_python-1.0}/docs/source/introduction.rst +47 -29
  22. {genal_python-0.8 → genal_python-1.0}/genal/Geno.py +8 -4
  23. {genal_python-0.8 → genal_python-1.0}/genal/MR.py +0 -2
  24. {genal_python-0.8 → genal_python-1.0}/genal/MR_tools.py +0 -1
  25. {genal_python-0.8 → genal_python-1.0}/genal/MRpresso.py +0 -3
  26. {genal_python-0.8 → genal_python-1.0}/genal/__init__.py +2 -2
  27. {genal_python-0.8 → genal_python-1.0}/genal/extract_prs.py +36 -11
  28. {genal_python-0.8 → genal_python-1.0}/genal/geno_tools.py +3 -3
  29. {genal_python-0.8 → genal_python-1.0}/genal/snp_query.py +49 -25
  30. {genal_python-0.8 → genal_python-1.0}/genal/tools.py +143 -18
  31. genal_python-1.0/genal_logo.png +0 -0
  32. {genal_python-0.8 → genal_python-1.0}/pyproject.toml +3 -3
  33. genal_python-0.8/docs/_build/doctrees/api.doctree +0 -0
  34. genal_python-0.8/docs/_build/doctrees/environment.pickle +0 -0
  35. genal_python-0.8/docs/_build/doctrees/genal.doctree +0 -0
  36. genal_python-0.8/docs/_build/doctrees/index.doctree +0 -0
  37. genal_python-0.8/docs/_build/doctrees/introduction.doctree +0 -0
  38. genal_python-0.8/docs/_build/doctrees/modules.doctree +0 -0
  39. genal_python-0.8/docs/_build/doctrees/source/genal.doctree +0 -0
  40. genal_python-0.8/docs/_build/doctrees/source/modules.doctree +0 -0
  41. genal_python-0.8/docs/_build/html/.buildinfo +0 -4
  42. genal_python-0.8/docs/_build/html/_sources/api.rst.txt +0 -24
  43. genal_python-0.8/docs/_build/html/_sources/index.rst.txt +0 -59
  44. genal_python-0.8/docs/_build/html/_sources/introduction.rst.txt +0 -505
  45. genal_python-0.8/docs/_build/html/_sources/modules.rst.txt +0 -7
  46. genal_python-0.8/docs/_build/html/_sources/source/genal.rst.txt +0 -101
  47. genal_python-0.8/docs/_build/html/_sources/source/modules.rst.txt +0 -7
  48. genal_python-0.8/docs/_build/html/_static/_sphinx_javascript_frameworks_compat.js +0 -123
  49. genal_python-0.8/docs/_build/html/_static/basic.css +0 -925
  50. genal_python-0.8/docs/_build/html/_static/doctools.js +0 -156
  51. genal_python-0.8/docs/_build/html/_static/documentation_options.js +0 -13
  52. genal_python-0.8/docs/_build/html/_static/jquery.js +0 -2
  53. genal_python-0.8/docs/_build/html/_static/language_data.js +0 -199
  54. genal_python-0.8/docs/_build/html/_static/searchtools.js +0 -574
  55. genal_python-0.8/docs/_build/html/api.html +0 -876
  56. genal_python-0.8/docs/_build/html/genal.html +0 -2039
  57. genal_python-0.8/docs/_build/html/genindex.html +0 -701
  58. genal_python-0.8/docs/_build/html/index.html +0 -192
  59. genal_python-0.8/docs/_build/html/introduction.html +0 -584
  60. genal_python-0.8/docs/_build/html/modules.html +0 -269
  61. genal_python-0.8/docs/_build/html/objects.inv +0 -0
  62. genal_python-0.8/docs/_build/html/py-modindex.html +0 -177
  63. genal_python-0.8/docs/_build/html/search.html +0 -122
  64. genal_python-0.8/docs/_build/html/searchindex.js +0 -1
  65. genal_python-0.8/docs/_build/html/source/genal.html +0 -2060
  66. genal_python-0.8/docs/_build/html/source/modules.html +0 -259
  67. genal_python-0.8/docs/build/.doctrees/environment.pickle +0 -0
  68. genal_python-0.8/docs/build/.doctrees/introduction.doctree +0 -0
  69. genal_python-0.8/docs/build/_sources/genal.rst.txt +0 -101
  70. genal_python-0.8/docs/build/_static/css/badge_only.css +0 -1
  71. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
  72. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
  73. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
  74. genal_python-0.8/docs/build/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
  75. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.eot +0 -0
  76. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.svg +0 -2671
  77. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.ttf +0 -0
  78. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.woff +0 -0
  79. genal_python-0.8/docs/build/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
  80. genal_python-0.8/docs/build/_static/css/fonts/lato-bold-italic.woff +0 -0
  81. genal_python-0.8/docs/build/_static/css/fonts/lato-bold-italic.woff2 +0 -0
  82. genal_python-0.8/docs/build/_static/css/fonts/lato-bold.woff +0 -0
  83. genal_python-0.8/docs/build/_static/css/fonts/lato-bold.woff2 +0 -0
  84. genal_python-0.8/docs/build/_static/css/fonts/lato-normal-italic.woff +0 -0
  85. genal_python-0.8/docs/build/_static/css/fonts/lato-normal-italic.woff2 +0 -0
  86. genal_python-0.8/docs/build/_static/css/fonts/lato-normal.woff +0 -0
  87. genal_python-0.8/docs/build/_static/css/fonts/lato-normal.woff2 +0 -0
  88. genal_python-0.8/docs/build/_static/css/theme.css +0 -4
  89. genal_python-0.8/docs/build/_static/file.png +0 -0
  90. genal_python-0.8/docs/build/_static/js/badge_only.js +0 -1
  91. genal_python-0.8/docs/build/_static/js/html5shiv-printshiv.min.js +0 -4
  92. genal_python-0.8/docs/build/_static/js/html5shiv.min.js +0 -4
  93. genal_python-0.8/docs/build/_static/js/theme.js +0 -1
  94. genal_python-0.8/docs/build/_static/minus.png +0 -0
  95. genal_python-0.8/docs/build/_static/plus.png +0 -0
  96. genal_python-0.8/docs/build/_static/pygments.css +0 -75
  97. genal_python-0.8/docs/build/_static/sphinx_highlight.js +0 -154
  98. genal_python-0.8/docs/build/searchindex.js +0 -1
  99. genal_python-0.8/requirements.txt +0 -13
  100. {genal_python-0.8 → genal_python-1.0}/.readthedocs.yaml +0 -0
  101. {genal_python-0.8 → genal_python-1.0}/LICENSE +0 -0
  102. {genal_python-0.8 → genal_python-1.0}/docs/Makefile +0 -0
  103. {genal_python-0.8 → genal_python-1.0}/docs/build/.buildinfo +0 -0
  104. {genal_python-0.8 → genal_python-1.0}/docs/build/.doctrees/genal.doctree +0 -0
  105. {genal_python-0.8 → genal_python-1.0}/docs/build/.doctrees/modules.doctree +0 -0
  106. {genal_python-0.8 → genal_python-1.0}/docs/build/_images/MR_plot_SBP_AS.png +0 -0
  107. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/MR.html +0 -0
  108. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/MR_tools.html +0 -0
  109. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/MRpresso.html +0 -0
  110. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/association.html +0 -0
  111. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/clump.html +0 -0
  112. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/extract_prs.html +0 -0
  113. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/geno_tools.html +0 -0
  114. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/lift.html +0 -0
  115. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/proxy.html +0 -0
  116. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/snp_query.html +0 -0
  117. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/genal/tools.html +0 -0
  118. {genal_python-0.8 → genal_python-1.0}/docs/build/_modules/index.html +0 -0
  119. {genal_python-0.8 → genal_python-1.0}/docs/build/_sources/api.rst.txt +0 -0
  120. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_sources/genal.rst.txt +0 -0
  121. {genal_python-0.8 → genal_python-1.0}/docs/build/_sources/modules.rst.txt +0 -0
  122. {genal_python-0.8 → genal_python-1.0}/docs/build/_static/basic.css +0 -0
  123. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/badge_only.css +0 -0
  124. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/Roboto-Slab-Bold.woff +0 -0
  125. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/Roboto-Slab-Bold.woff2 +0 -0
  126. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/Roboto-Slab-Regular.woff +0 -0
  127. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/Roboto-Slab-Regular.woff2 +0 -0
  128. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/fontawesome-webfont.eot +0 -0
  129. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/fontawesome-webfont.svg +0 -0
  130. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/fontawesome-webfont.ttf +0 -0
  131. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/fontawesome-webfont.woff +0 -0
  132. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/fontawesome-webfont.woff2 +0 -0
  133. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-bold-italic.woff +0 -0
  134. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-bold-italic.woff2 +0 -0
  135. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-bold.woff +0 -0
  136. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-bold.woff2 +0 -0
  137. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-normal-italic.woff +0 -0
  138. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-normal-italic.woff2 +0 -0
  139. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-normal.woff +0 -0
  140. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/fonts/lato-normal.woff2 +0 -0
  141. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/css/theme.css +0 -0
  142. {genal_python-0.8 → genal_python-1.0}/docs/build/_static/doctools.js +0 -0
  143. {genal_python-0.8 → genal_python-1.0}/docs/build/_static/documentation_options.js +0 -0
  144. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/file.png +0 -0
  145. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/js/badge_only.js +0 -0
  146. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/js/html5shiv-printshiv.min.js +0 -0
  147. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/js/html5shiv.min.js +0 -0
  148. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/js/theme.js +0 -0
  149. {genal_python-0.8 → genal_python-1.0}/docs/build/_static/language_data.js +0 -0
  150. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/minus.png +0 -0
  151. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/plus.png +0 -0
  152. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/pygments.css +0 -0
  153. {genal_python-0.8 → genal_python-1.0}/docs/build/_static/searchtools.js +0 -0
  154. {genal_python-0.8/docs/_build/html → genal_python-1.0/docs/build}/_static/sphinx_highlight.js +0 -0
  155. {genal_python-0.8 → genal_python-1.0}/docs/build/genal.html +0 -0
  156. {genal_python-0.8 → genal_python-1.0}/docs/build/genindex.html +0 -0
  157. {genal_python-0.8 → genal_python-1.0}/docs/build/modules.html +0 -0
  158. {genal_python-0.8 → genal_python-1.0}/docs/build/objects.inv +0 -0
  159. {genal_python-0.8 → genal_python-1.0}/docs/build/py-modindex.html +0 -0
  160. {genal_python-0.8 → genal_python-1.0}/docs/build/search.html +0 -0
  161. {genal_python-0.8 → genal_python-1.0}/docs/make.bat +0 -0
  162. {genal_python-0.8 → genal_python-1.0}/docs/requirements.txt +0 -0
  163. {genal_python-0.8 → genal_python-1.0}/docs/source/.DS_Store +0 -0
  164. {genal_python-0.8 → genal_python-1.0}/docs/source/Images/MR_plot_SBP_AS.png +0 -0
  165. {genal_python-0.8 → genal_python-1.0}/docs/source/api.rst +0 -0
  166. {genal_python-0.8 → genal_python-1.0}/docs/source/modules.rst +0 -0
  167. {genal_python-0.8 → genal_python-1.0}/genal/association.py +0 -0
  168. {genal_python-0.8 → genal_python-1.0}/genal/clump.py +0 -0
  169. {genal_python-0.8 → genal_python-1.0}/genal/constants.py +0 -0
  170. {genal_python-0.8 → genal_python-1.0}/genal/lift.py +0 -0
  171. {genal_python-0.8 → genal_python-1.0}/genal/proxy.py +0 -0
  172. {genal_python-0.8 → genal_python-1.0}/gitignore +0 -0
  173. {genal_python-0.8 → genal_python-1.0}/readthedocs.yaml +0 -0
Binary file
@@ -2,4 +2,5 @@ __pycache__/
2
2
  dist/
3
3
  .ipynb_checkpoints/
4
4
  ipynb_checkpoints/
5
- genal/.ipynb_checkpoints/
5
+ genal/.ipynb_checkpoints/
6
+ test_data/
Binary file
@@ -1,9 +1,9 @@
1
- Metadata-Version: 2.1
1
+ Metadata-Version: 2.3
2
2
  Name: genal-python
3
- Version: 0.8
3
+ Version: 1.0
4
4
  Summary: A python toolkit for polygenic risk scoring and mendelian randomization.
5
5
  Author-email: Cyprien Rivier <riviercyprien@gmail.com>
6
- Requires-Python: >=3.7
6
+ Requires-Python: >=3.8
7
7
  Description-Content-Type: text/markdown
8
8
  Classifier: Programming Language :: Python :: 3
9
9
  Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
@@ -16,12 +16,16 @@ Requires-Dist: plotnine==0.12.3
16
16
  Requires-Dist: psutil==5.9.1
17
17
  Requires-Dist: pyliftover==0.4
18
18
  Requires-Dist: scikit_learn>=1.3.0
19
- Requires-Dist: scipy>=1.11.4
19
+ Requires-Dist: scipy>=1.10.1, <1.11
20
20
  Requires-Dist: statsmodels==0.14.0
21
21
  Requires-Dist: tqdm==4.66.1
22
22
  Requires-Dist: wget==3.2
23
23
  Project-URL: Home, https://github.com/CypRiv/genal
24
24
 
25
+ [![Python 3.8](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/release/python-3100/)
26
+
27
+ <img src="/genal_logo.png" data-canonical-src="/genal_logo.png" height="80" />
28
+
25
29
  <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
26
30
 
27
31
 
@@ -54,33 +58,52 @@ The module prioritizes user-friendliness and intuitive operation, aiming to redu
54
58
 
55
59
  Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python's data science tools.
56
60
 
61
+ <img src="/Genal_flowchart.png" data-canonical-src="/Genal_flowchart.png" style="max-width:100%;" />
62
+
63
+ Genal flowchart. Created in https://www.BioRender.com
57
64
  ## Citation <a name="citation"></a>
58
65
  If you're using genal, please cite the following paper:
59
66
  **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
60
67
 
61
68
  ## Requirements for the genal module <a name="paragraph1"></a>
62
- ***Python 3.9 or later***. https://www.python.org/ <br>
69
+ ***Python 3.8 or later***. https://www.python.org/ <br>
63
70
 
64
71
 
65
72
  ## Installation and How to use the genal module <a name="paragraph2"></a>
66
73
 
67
74
  ### Installation <a name="paragraph2.1"></a>
68
75
 
76
+ > **Note:**
77
+ >
78
+ > **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
79
+ > ```
80
+ > conda create --name genal_env python=3.8
81
+ > conda activate genal_env
82
+ > ```
83
+
84
+
69
85
  Download and install the package with pip:
70
86
  ```
71
87
  pip install genal-python
72
88
  ```
73
- And it can be imported in a python environment with:
89
+ And import it in a python environment with:
74
90
  ```python
75
91
  import genal
76
92
  ```
77
93
 
78
- The main genal functionalities require a working installation of PLINK v1.9 that can be downloaded here: https://www.cog-genomics.org/plink/
79
- Once downloaded, the path to the plink executable can be set with:
94
+ The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
95
+ If you have already installed plink v1.9, you can set the path to its executable with:
80
96
 
81
97
  ```
82
98
  genal.set_plink(path="/path/to/plink/executable/file")
83
99
  ```
100
+
101
+ If plink is not installed, genal can install the correct version for your system with the following line:
102
+
103
+ ```
104
+ genal.install_plink()
105
+ ```
106
+
84
107
  ### Documentation <a name="paragraph2.2"></a>
85
108
 
86
109
  For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
@@ -115,7 +138,7 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
115
138
 
116
139
  ### Data loading <a name="paragraph3.1"></a>
117
140
 
118
- We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
141
+ We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). [Download link](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
119
142
 
120
143
  ```python
121
144
  import pandas as pd
@@ -237,7 +260,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
237
260
 
238
261
  ### Clumping <a name="paragraph3.3"></a>
239
262
 
240
- Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
263
+ Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
241
264
 
242
265
  The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
243
266
 
@@ -309,7 +332,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
309
332
  Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
310
333
 
311
334
  ```python
312
- SBP_clumped.prs(name = "SBP_prs" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
335
+ SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
313
336
  ```
314
337
 
315
338
  and the output is:
@@ -369,7 +392,7 @@ You can customize how the proxies are chosen with the following arguments:
369
392
 
370
393
  To run MR, we need to load both our exposure and outcome SNP-level data in `genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our `SBP_clumped` `genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the `BETA` column).
371
394
 
372
- To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium ([https://www.nature.com/articles/s41586-022-05165-3](https://www.nature.com/articles/s41586-022-05165-3)):
395
+ To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium: [Link to study](https://www.nature.com/articles/s41586-022-05165-3). [Link to download](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz):
373
396
 
374
397
  ```python
375
398
  stroke_gwas = pd.read_csv("GCST90104539_buildGRCh37.tsv",sep="\t")
@@ -410,21 +433,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
410
433
  1541 SNPs out of 1545 are present in the outcome data.
411
434
  (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
412
435
 
413
- Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
414
436
 
415
- ```python
416
- SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
417
- ```
418
-
419
- And genal will print the number of missing instruments which have been proxied:
420
-
421
- Outcome data successfully loaded from 'b352e412' geno instance.
422
- Identifying the exposure SNPs present in the outcome data...
423
- 1541 SNPs out of 1545 are present in the outcome data.
424
- Searching proxies for 4 SNPs...
425
- Using the EUR reference panel.
426
- Found proxies for 4 SNPs.
427
- (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
437
+ > **Note:**
438
+ >Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
439
+ >
440
+ > Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
441
+ >
442
+ > ```python
443
+ > SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
444
+ > ```
445
+ >
446
+ > And genal will print the number of missing instruments that have been proxied:
447
+ >
448
+ > Outcome data successfully loaded from 'b352e412' geno instance.
449
+ > Identifying the exposure SNPs present in the outcome data...
450
+ > 1541 SNPs out of 1545 are present in the outcome data.
451
+ > Searching proxies for 4 SNPs...
452
+ > Using the EUR reference panel.
453
+ > Found proxies for 4 SNPs.
454
+ > (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
428
455
 
429
456
  After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
430
457
 
@@ -469,7 +496,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
469
496
  - `Weighted-mode` for the Weighted mode method
470
497
  - `all` to run all the above methods
471
498
 
472
- For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
499
+ For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
473
500
 
474
501
  If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
475
502
 
@@ -522,7 +549,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
522
549
 
523
550
  > **Note:**
524
551
  >
525
- > One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
552
+ > One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
526
553
 
527
554
  Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
528
555
 
@@ -1,3 +1,7 @@
1
+ [![Python 3.8](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/downloads/release/python-3100/)
2
+
3
+ <img src="/genal_logo.png" data-canonical-src="/genal_logo.png" height="80" />
4
+
1
5
  <center><h1> genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization </h1></center>
2
6
 
3
7
 
@@ -30,33 +34,52 @@ The module prioritizes user-friendliness and intuitive operation, aiming to redu
30
34
 
31
35
  Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python's data science tools.
32
36
 
37
+ <img src="/Genal_flowchart.png" data-canonical-src="/Genal_flowchart.png" style="max-width:100%;" />
38
+
39
+ Genal flowchart. Created in https://www.BioRender.com
33
40
  ## Citation <a name="citation"></a>
34
41
  If you're using genal, please cite the following paper:
35
42
  **Genal: A Python Toolkit for Genetic Risk Scoring and Mendelian Randomization.** Cyprien A. Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N. Sheth, Guido J. Falcone, Julian N. Acosta. medRxiv 2024.05.23.24307776; doi: https://doi.org/10.1101/2024.05.23.24307776
36
43
 
37
44
  ## Requirements for the genal module <a name="paragraph1"></a>
38
- ***Python 3.9 or later***. https://www.python.org/ <br>
45
+ ***Python 3.8 or later***. https://www.python.org/ <br>
39
46
 
40
47
 
41
48
  ## Installation and How to use the genal module <a name="paragraph2"></a>
42
49
 
43
50
  ### Installation <a name="paragraph2.1"></a>
44
51
 
52
+ > **Note:**
53
+ >
54
+ > **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal_env'.
55
+ > ```
56
+ > conda create --name genal_env python=3.8
57
+ > conda activate genal_env
58
+ > ```
59
+
60
+
45
61
  Download and install the package with pip:
46
62
  ```
47
63
  pip install genal-python
48
64
  ```
49
- And it can be imported in a python environment with:
65
+ And import it in a python environment with:
50
66
  ```python
51
67
  import genal
52
68
  ```
53
69
 
54
- The main genal functionalities require a working installation of PLINK v1.9 that can be downloaded here: https://www.cog-genomics.org/plink/
55
- Once downloaded, the path to the plink executable can be set with:
70
+ The main genal functionalities require a working installation of PLINK v1.9 (and not 2.0 as certain functionalities have not been updated yet).
71
+ If you have already installed plink v1.9, you can set the path to its executable with:
56
72
 
57
73
  ```
58
74
  genal.set_plink(path="/path/to/plink/executable/file")
59
75
  ```
76
+
77
+ If plink is not installed, genal can install the correct version for your system with the following line:
78
+
79
+ ```
80
+ genal.install_plink()
81
+ ```
82
+
60
83
  ### Documentation <a name="paragraph2.2"></a>
61
84
 
62
85
  For detailed information on how to use the functionalities of Genal, please refer to the documentation: https://genal.rtfd.io
@@ -91,7 +114,7 @@ For this tutorial, we will obtain genetic instruments for systolic blood pressur
91
114
 
92
115
  ### Data loading <a name="paragraph3.1"></a>
93
116
 
94
- We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
117
+ We start this tutorial with publicly available summary statistics from a large GWAS study of systolic blood pressure. [Link to study](https://www.nature.com/articles/s41588-018-0205-x). [Download link](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST006001-GCST007000/GCST006624/Evangelou_30224653_SBP.txt.gz). After downloading and unzipping the summary statistics, we load them into a pandas DataFrame:
95
118
 
96
119
  ```python
97
120
  import pandas as pd
@@ -213,7 +236,7 @@ You can also use a custom reference panel by specifying to the reference_panel a
213
236
 
214
237
  ### Clumping <a name="paragraph3.3"></a>
215
238
 
216
- Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
239
+ Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
217
240
 
218
241
  The SNP-data loaded in a `genal.Geno` instance can be clumped using the `genal.Geno.clump` method. It will return another `genal.Geno` instance containing only the clumped data:
219
242
 
@@ -285,7 +308,7 @@ The output of the `genal.Geno.prs` method will include how many SNPs were used t
285
308
  Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the `proxy = True`. argument:
286
309
 
287
310
  ```python
288
- SBP_clumped.prs(name = "SBP_prs" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
311
+ SBP_clumped.prs(name = "SBP_prs_proxy" ,path = "Pop_chr$", proxy = True, reference_panel = "eur", r2=0.8, kb=5000, window_snps=5000)
289
312
  ```
290
313
 
291
314
  and the output is:
@@ -345,7 +368,7 @@ You can customize how the proxies are chosen with the following arguments:
345
368
 
346
369
  To run MR, we need to load both our exposure and outcome SNP-level data in `genal.Geno` instances. In our case, the genetic instruments of the MR are the SNPs associated with blood pressure at genome-wide significant levels resulting from the clumping of the blood pressure GWAS. They are stored in our `SBP_clumped` `genal.Geno` instance which also include their association with the exposure trait (instrument-SBP estimates in the `BETA` column).
347
370
 
348
- To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium ([https://www.nature.com/articles/s41586-022-05165-3](https://www.nature.com/articles/s41586-022-05165-3)):
371
+ To get their association with the outcome trait (instrument-stroke estimates), we are going to use SNP-level data from a large GWAS of stroke performed by the GIGASTROKE consortium: [Link to study](https://www.nature.com/articles/s41586-022-05165-3). [Link to download](http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90104001-GCST90105000/GCST90104539/GCST90104539_buildGRCh37.tsv.gz):
349
372
 
350
373
  ```python
351
374
  stroke_gwas = pd.read_csv("GCST90104539_buildGRCh37.tsv",sep="\t")
@@ -386,21 +409,25 @@ Genal will print how many SNPs were successfully found and extracted from the ou
386
409
  1541 SNPs out of 1545 are present in the outcome data.
387
410
  (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
388
411
 
389
- Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
390
412
 
391
- ```python
392
- SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
393
- ```
394
-
395
- And genal will print the number of missing instruments which have been proxied:
396
-
397
- Outcome data successfully loaded from 'b352e412' geno instance.
398
- Identifying the exposure SNPs present in the outcome data...
399
- 1541 SNPs out of 1545 are present in the outcome data.
400
- Searching proxies for 4 SNPs...
401
- Using the EUR reference panel.
402
- Found proxies for 4 SNPs.
403
- (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
413
+ > **Note:**
414
+ >Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
415
+ >
416
+ > Here as well you have the option to use proxies for the instruments that are not present in the outcome data:
417
+ >
418
+ > ```python
419
+ > SBP_clumped.query_outcome(Stroke_geno, proxy = True, reference_panel = "eur", kb = 5000, r2 = 0.6, window_snps = 5000)
420
+ > ```
421
+ >
422
+ > And genal will print the number of missing instruments that have been proxied:
423
+ >
424
+ > Outcome data successfully loaded from 'b352e412' geno instance.
425
+ > Identifying the exposure SNPs present in the outcome data...
426
+ > 1541 SNPs out of 1545 are present in the outcome data.
427
+ > Searching proxies for 4 SNPs...
428
+ > Using the EUR reference panel.
429
+ > Found proxies for 4 SNPs.
430
+ > (Exposure data, Outcome data, Outcome name) stored in the .MR_data attribute.
404
431
 
405
432
  After extracting the instruments from the outcome data, the `SBP_clumped` `genal.Geno` instance contains an `MR_data` attribute containing the instruments-exposure and instruments-outcome associations necessary to run MR. Running MR is now as simple as calling the `genal.Geno.MR` method of the SBP_clumped `genal.Geno` instance:
406
433
 
@@ -445,7 +472,7 @@ By default, only some MR methods (inverse-variance weighted, weighted median, Si
445
472
  - `Weighted-mode` for the Weighted mode method
446
473
  - `all` to run all the above methods
447
474
 
448
- For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API.
475
+ For more fine-tuning, such as settings for the number of boostrapping iterations, please refer to the API: [https://genal.readthedocs.io/en/latest/modules.html#id4](MR method).
449
476
 
450
477
  If you want to visualize the obtained MR results, you can use the `genal.Geno.MR_plot` method that will plot each SNP in an `effect_on_exposure x effect_on_outcome` plane as well as lines corresponding to different MR methods:
451
478
 
@@ -498,7 +525,7 @@ df_pheno = pd.read_csv("path/to/trait/data")
498
525
 
499
526
  > **Note:**
500
527
  >
501
- > One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
528
+ > One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
502
529
 
503
530
  Then, it is advised to make a copy of the `genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
504
531
 
@@ -3,14 +3,14 @@
3
3
  <head>
4
4
  <meta charset="utf-8" />
5
5
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
- <title>genal.Geno &mdash; genal v0.0 documentation</title>
6
+ <title>genal.Geno &mdash; genal v0.8 documentation</title>
7
7
  <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
8
8
  <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
9
9
  <!--[if lt IE 9]>
10
10
  <script src="../../_static/js/html5shiv.min.js"></script>
11
11
  <![endif]-->
12
12
 
13
- <script src="../../_static/documentation_options.js?v=90b5f367"></script>
13
+ <script src="../../_static/documentation_options.js?v=b326c068"></script>
14
14
  <script src="../../_static/doctools.js?v=9a2dae69"></script>
15
15
  <script src="../../_static/sphinx_highlight.js?v=dc90522c"></script>
16
16
  <script src="../../_static/js/theme.js"></script>
@@ -940,7 +940,7 @@
940
940
  <span class="n">methods</span><span class="o">=</span><span class="p">[</span>
941
941
  <span class="s2">&quot;IVW&quot;</span><span class="p">,</span>
942
942
  <span class="s2">&quot;WM&quot;</span><span class="p">,</span>
943
- <span class="s2">&quot;Simple-median&quot;</span><span class="p">,</span>
943
+ <span class="s2">&quot;Simple-mode&quot;</span><span class="p">,</span>
944
944
  <span class="s2">&quot;Egger&quot;</span><span class="p">,</span>
945
945
  <span class="p">],</span>
946
946
  <span class="n">exposure_name</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
@@ -18,7 +18,7 @@ Genal draws on concepts from well-established R packages such as TwoSampleMR, MR
18
18
 
19
19
  To install the latest release, type::
20
20
 
21
- pip install genal
21
+ pip install genal-python
22
22
 
23
23
  Contents
24
24
  --------
@@ -2,13 +2,21 @@
2
2
  Installation
3
3
  ============
4
4
 
5
- The genal package can be easily installed with pip:
5
+ .. note::
6
+ **Optional**: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called 'genal'.
7
+
8
+ .. code-block:: bash
9
+
10
+ conda create --name genal python=3.11
11
+ conda activate genal
12
+
13
+ The genal package requires Python 3.11. Download and install it with pip:
6
14
 
7
15
  .. code-block:: bash
8
16
 
9
- pip install genal
17
+ pip install genal-python
10
18
 
11
- And it can be imported in a python environment with:
19
+ And import it in a python environment with:
12
20
 
13
21
  .. code-block:: python
14
22
 
@@ -145,7 +153,7 @@ By default, and depending on the global preprocessing level (``'None'``, ``'Fill
145
153
 
146
154
  If you do not wish to run certain steps, or wish to run only certain steps, you can use additional arguments. For more information, please refer to the :meth:`~genal.Geno.preprocess_data` method in the API documentation.
147
155
 
148
- In our case, the ``SNP`` column (for SNP identifier - rsid) was missing from our dataframe and has been added based on a 1000 genome reference panel:
156
+ In our case, the ``SNP`` column (for SNP identifier - rsid) was missing from our dataframe and has been added based on a 1000 genome reference panel::
149
157
 
150
158
  Using the EUR reference panel.
151
159
  The SNP column (rsID) has been created. 197511 (2.787%) SNPs were not found in the reference data and their ID set to CHR:POS:EA.
@@ -176,7 +184,7 @@ You can also use a custom reference panel by specifying the path to bed/bim/fam
176
184
  Clumping
177
185
  --------
178
186
 
179
- Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
187
+ Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.
180
188
 
181
189
  The SNP-data loaded in a :class:`~genal.Geno` instance can be clumped using the :meth:`~genal.Geno.clump` method. It will return another :class:`~genal.Geno` instance containing only the clumped data:
182
190
 
@@ -250,7 +258,7 @@ Here, we see that about half of the SNPs were not extracted from the data. In su
250
258
 
251
259
  .. code-block:: python
252
260
 
253
- SBP_clumped.prs(name="SBP_prs", path="Pop_chr$", proxy=True, reference_panel="eur", r2=0.8, kb=5000, window_snps=5000)
261
+ SBP_clumped.prs(name="SBP_prs_proxy", path="Pop_chr$", proxy=True, reference_panel="eur", r2=0.8, kb=5000, window_snps=5000)
254
262
 
255
263
  and the output is::
256
264
 
@@ -489,7 +497,7 @@ Let's start by loading phenotypic data:
489
497
  df_pheno = pd.read_csv("path/to/trait/data")
490
498
 
491
499
  .. note::
492
- One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
500
+ One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.
493
501
 
494
502
  Then, it is advised to make a copy of the :class:`~genal.Geno` instance containing our instruments as we are going to update their coefficients and to avoid any confusion:
495
503
 
@@ -589,7 +597,7 @@ Which will output::
589
597
  701 (45.37%) SNPs failed to query (not found in GWAS Catalog) and 7 (0.5%) SNPs timed out after 34.33 seconds. You can increase the timeout value with the timeout argument.
590
598
 
591
599
  And the :attr:`~genal.Geno.data` attribute now contains an `ASSOC` column::
592
-
600
+
593
601
  EA NEA EAF BETA SE CHR POS SNP ASSOC
594
602
  0 A G 0.1784 0.2330 0.0402 10 102075479 rs603424 [eicosanoids measurement, decadienedioic acid (...]
595
603
  1 A G 0.0706 -0.3873 0.0626 10 102403682 rs2996303 FAILED_QUERY
@@ -473,7 +473,7 @@ flipping palindromic SNPs (relevant if action=2). Default is 0.42.</p></li>
473
473
 
474
474
  <dl class="py method">
475
475
  <dt class="sig sig-object py" id="id1">
476
- <span class="sig-name descname"><span class="pre">MR_plot</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">methods</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">['IVW',</span> <span class="pre">'WM',</span> <span class="pre">'Simple-median',</span> <span class="pre">'Egger']</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exposure_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">outcome_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/genal/Geno.html#Geno.MR_plot"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#id1" title="Link to this definition">¶</a></dt>
476
+ <span class="sig-name descname"><span class="pre">MR_plot</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">methods</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">['IVW',</span> <span class="pre">'WM',</span> <span class="pre">'Simple-mode',</span> <span class="pre">'Egger']</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">exposure_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">outcome_name</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">filename</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/genal/Geno.html#Geno.MR_plot"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#id1" title="Link to this definition">¶</a></dt>
477
477
  <dd><p>Creates and returns a scatter plot of individual SNP effects with lines representing different Mendelian Randomization (MR) methods. Each MR method specified in the ‘methods’ argument is represented as a line in the plot.</p>
478
478
  <dl class="field-list simple">
479
479
  <dt class="field-odd">Parameters<span class="colon">:</span></dt>
@@ -89,7 +89,7 @@
89
89
  <p>The module prioritizes user-friendliness and intuitive operation, aiming to reduce the complexity of data analysis for researchers. Despite its focus on simplicity, Genal does not sacrifice the depth of customization or the precision of analysis. Researchers can expect to maintain analytical rigour while benefiting from the streamlined experience.</p>
90
90
  <p>Genal draws on concepts from well-established R packages such as TwoSampleMR, MR-Presso, MendelianRandomization, and gwasvcf, adapting their proven methodologies to the Python environment. This approach ensures that users have access to tried and tested techniques with the versatility of Python’s data science tools.</p>
91
91
  <p>To install the latest release, type:</p>
92
- <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">genal</span>
92
+ <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">genal</span><span class="o">-</span><span class="n">python</span>
93
93
  </pre></div>
94
94
  </div>
95
95
  <section id="contents">
@@ -88,11 +88,19 @@
88
88
 
89
89
  <section id="installation">
90
90
  <h1>Installation<a class="headerlink" href="#installation" title="Link to this heading">¶</a></h1>
91
- <p>The genal package can be easily installed with pip:</p>
92
- <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>genal
91
+ <div class="admonition note">
92
+ <p class="admonition-title">Note</p>
93
+ <p><strong>Optional</strong>: It is recommended to create a new environment to avoid dependencies conflicts. Here, we create a new conda environment called ‘genal’.</p>
94
+ <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>conda<span class="w"> </span>create<span class="w"> </span>--name<span class="w"> </span>genal<span class="w"> </span><span class="nv">python</span><span class="o">=</span><span class="m">3</span>.11
95
+ conda<span class="w"> </span>activate<span class="w"> </span>genal
96
+ </pre></div>
97
+ </div>
98
+ </div>
99
+ <p>The genal package requires Python 3.11. Download and install it with pip:</p>
100
+ <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>genal-python
93
101
  </pre></div>
94
102
  </div>
95
- <p>And it can be imported in a python environment with:</p>
103
+ <p>And import it in a python environment with:</p>
96
104
  <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">genal</span>
97
105
  </pre></div>
98
106
  </div>
@@ -205,11 +213,11 @@ Once downloaded, the path to the plink 1.9 executable should be set with:</p>
205
213
  </ul>
206
214
  <p>If you do not wish to run certain steps, or wish to run only certain steps, you can use additional arguments. For more information, please refer to the <a class="reference internal" href="modules.html#id0" title="genal.Geno.preprocess_data"><code class="xref py py-meth docutils literal notranslate"><span class="pre">preprocess_data()</span></code></a> method in the API documentation.</p>
207
215
  <p>In our case, the <code class="docutils literal notranslate"><span class="pre">SNP</span></code> column (for SNP identifier - rsid) was missing from our dataframe and has been added based on a 1000 genome reference panel:</p>
208
- <blockquote>
209
- <div><p>Using the EUR reference panel.
210
- The SNP column (rsID) has been created. 197511 (2.787%) SNPs were not found in the reference data and their ID set to CHR:POS:EA.
211
- The BETA column looks like Beta estimates. Use effect_column=’OR’ if it is a column of Odds Ratios.</p>
212
- </div></blockquote>
216
+ <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Using</span> <span class="n">the</span> <span class="n">EUR</span> <span class="n">reference</span> <span class="n">panel</span><span class="o">.</span>
217
+ <span class="n">The</span> <span class="n">SNP</span> <span class="n">column</span> <span class="p">(</span><span class="n">rsID</span><span class="p">)</span> <span class="n">has</span> <span class="n">been</span> <span class="n">created</span><span class="o">.</span> <span class="mi">197511</span> <span class="p">(</span><span class="mf">2.787</span><span class="o">%</span><span class="p">)</span> <span class="n">SNPs</span> <span class="n">were</span> <span class="ow">not</span> <span class="n">found</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">reference</span> <span class="n">data</span> <span class="ow">and</span> <span class="n">their</span> <span class="n">ID</span> <span class="nb">set</span> <span class="n">to</span> <span class="n">CHR</span><span class="p">:</span><span class="n">POS</span><span class="p">:</span><span class="n">EA</span><span class="o">.</span>
218
+ <span class="n">The</span> <span class="n">BETA</span> <span class="n">column</span> <span class="n">looks</span> <span class="n">like</span> <span class="n">Beta</span> <span class="n">estimates</span><span class="o">.</span> <span class="n">Use</span> <span class="n">effect_column</span><span class="o">=</span><span class="s1">&#39;OR&#39;</span> <span class="k">if</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">a</span> <span class="n">column</span> <span class="n">of</span> <span class="n">Odds</span> <span class="n">Ratios</span><span class="o">.</span>
219
+ </pre></div>
220
+ </div>
213
221
  <p>You can always check the data of a <code class="docutils literal notranslate"><span class="pre">genal.Geno</span></code> instance by accessing the <code class="docutils literal notranslate"><span class="pre">data</span></code> attribute:</p>
214
222
  <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">SBP_Geno</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
215
223
  <span class="go"> EA NEA EAF BETA SE P CHR POS SNP</span>
@@ -229,7 +237,7 @@ The BETA column looks like Beta estimates. Use effect_column=’OR’ if it is a
229
237
  </section>
230
238
  <section id="clumping">
231
239
  <h2>Clumping<a class="headerlink" href="#clumping" title="Link to this heading">¶</a></h2>
232
- <p>Clumping is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.</p>
240
+ <p>Clumping, or C+T: Clumping + Thresholding, is the step at which we select the SNPs that will be used as our genetic instruments in future Polygenic Risk Scores and Mendelian Randomization analyses. The process involves identifying the SNPs that are strongly associated with our trait of interest (systolic blood pressure in this tutorial) and are independent from each other. This second step ensures that selected SNPs are not highly correlated, (i.e., they are not in high linkage disequilibrium). For this step, we again need to use a reference panel.</p>
233
241
  <p>The SNP-data loaded in a <a class="reference internal" href="modules.html#genal.Geno" title="genal.Geno"><code class="xref py py-class docutils literal notranslate"><span class="pre">Geno</span></code></a> instance can be clumped using the <a class="reference internal" href="modules.html#id1" title="genal.Geno.clump"><code class="xref py py-meth docutils literal notranslate"><span class="pre">clump()</span></code></a> method. It will return another <a class="reference internal" href="modules.html#genal.Geno" title="genal.Geno"><code class="xref py py-class docutils literal notranslate"><span class="pre">Geno</span></code></a> instance containing only the clumped data:</p>
234
242
  <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_clumped</span> <span class="o">=</span> <span class="n">SBP_Geno</span><span class="o">.</span><span class="n">clump</span><span class="p">(</span><span class="n">p1</span><span class="o">=</span><span class="mf">5e-8</span><span class="p">,</span> <span class="n">r2</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">kb</span><span class="o">=</span><span class="mi">250</span><span class="p">,</span> <span class="n">reference_panel</span><span class="o">=</span><span class="s2">&quot;eur&quot;</span><span class="p">)</span>
235
243
  </pre></div>
@@ -293,7 +301,7 @@ The output of the <a class="reference internal" href="modules.html#id2" title="g
293
301
  </pre></div>
294
302
  </div>
295
303
  <p>Here, we see that about half of the SNPs were not extracted from the data. In such cases, we may want to try and salvage some of these SNPs by looking for proxies (SNPs in high linkage disequilibrium, i.e. highly correlated SNPs). This can be done by specifying the <code class="docutils literal notranslate"><span class="pre">proxy</span> <span class="pre">=</span> <span class="pre">True</span></code> argument:</p>
296
- <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_clumped</span><span class="o">.</span><span class="n">prs</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;SBP_prs&quot;</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="s2">&quot;Pop_chr$&quot;</span><span class="p">,</span> <span class="n">proxy</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">reference_panel</span><span class="o">=</span><span class="s2">&quot;eur&quot;</span><span class="p">,</span> <span class="n">r2</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">kb</span><span class="o">=</span><span class="mi">5000</span><span class="p">,</span> <span class="n">window_snps</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span>
304
+ <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_clumped</span><span class="o">.</span><span class="n">prs</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;SBP_prs_proxy&quot;</span><span class="p">,</span> <span class="n">path</span><span class="o">=</span><span class="s2">&quot;Pop_chr$&quot;</span><span class="p">,</span> <span class="n">proxy</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">reference_panel</span><span class="o">=</span><span class="s2">&quot;eur&quot;</span><span class="p">,</span> <span class="n">r2</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">kb</span><span class="o">=</span><span class="mi">5000</span><span class="p">,</span> <span class="n">window_snps</span><span class="o">=</span><span class="mi">5000</span><span class="p">)</span>
297
305
  </pre></div>
298
306
  </div>
299
307
  <p>and the output is:</p>
@@ -551,7 +559,7 @@ It can be run using the <a class="reference internal" href="modules.html#id5" ti
551
559
  </div>
552
560
  <div class="admonition note">
553
561
  <p class="admonition-title">Note</p>
554
- <p>One important detail is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.</p>
562
+ <p>One important point is to make sure that the IDs of the participants are identical in the phenotypic data and in the genetic data.</p>
555
563
  </div>
556
564
  <p>Then, it is advised to make a copy of the <a class="reference internal" href="modules.html#genal.Geno" title="genal.Geno"><code class="xref py py-class docutils literal notranslate"><span class="pre">Geno</span></code></a> instance containing our instruments as we are going to update their coefficients and to avoid any confusion:</p>
557
565
  <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">SBP_adjusted</span> <span class="o">=</span> <span class="n">SBP_clumped</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>