cool-seq-tool 0.3.0.dev0__py3-none-any.whl → 0.4.0.dev0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- cool_seq_tool/api.py +3 -3
- cool_seq_tool/app.py +32 -11
- cool_seq_tool/data/data_downloads.py +8 -5
- cool_seq_tool/handlers/seqrepo_access.py +55 -27
- cool_seq_tool/mappers/__init__.py +4 -1
- cool_seq_tool/mappers/alignment.py +40 -37
- cool_seq_tool/mappers/exon_genomic_coords.py +329 -138
- cool_seq_tool/mappers/mane_transcript.py +402 -227
- cool_seq_tool/routers/mane.py +1 -1
- cool_seq_tool/routers/mappings.py +1 -1
- cool_seq_tool/schemas.py +31 -24
- cool_seq_tool/sources/__init__.py +4 -2
- cool_seq_tool/sources/mane_transcript_mappings.py +28 -7
- cool_seq_tool/sources/transcript_mappings.py +27 -11
- cool_seq_tool/sources/uta_database.py +179 -232
- cool_seq_tool/utils.py +22 -24
- cool_seq_tool/version.py +1 -1
- {cool_seq_tool-0.3.0.dev0.dist-info → cool_seq_tool-0.4.0.dev0.dist-info}/LICENSE +1 -1
- cool_seq_tool-0.4.0.dev0.dist-info/METADATA +130 -0
- cool_seq_tool-0.4.0.dev0.dist-info/RECORD +28 -0
- {cool_seq_tool-0.3.0.dev0.dist-info → cool_seq_tool-0.4.0.dev0.dist-info}/WHEEL +1 -1
- cool_seq_tool/data/transcript_mapping.tsv +0 -256226
- cool_seq_tool-0.3.0.dev0.dist-info/METADATA +0 -187
- cool_seq_tool-0.3.0.dev0.dist-info/RECORD +0 -29
- {cool_seq_tool-0.3.0.dev0.dist-info → cool_seq_tool-0.4.0.dev0.dist-info}/top_level.txt +0 -0
@@ -1,187 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: cool-seq-tool
|
3
|
-
Version: 0.3.0.dev0
|
4
|
-
Summary: Common Operations On Lots-of Sequences Tool.
|
5
|
-
Home-page: https://github.com/GenomicMedLab/cool-seq-tool
|
6
|
-
Author: Wagner Lab, Nationwide Childrens Hospital
|
7
|
-
License: MIT
|
8
|
-
Requires-Python: >=3.8
|
9
|
-
Description-Content-Type: text/markdown
|
10
|
-
License-File: LICENSE
|
11
|
-
Requires-Dist: asyncpg
|
12
|
-
Requires-Dist: aiofiles
|
13
|
-
Requires-Dist: boto3
|
14
|
-
Requires-Dist: pyliftover
|
15
|
-
Requires-Dist: polars
|
16
|
-
Requires-Dist: hgvs
|
17
|
-
Requires-Dist: biocommons.seqrepo
|
18
|
-
Requires-Dist: pydantic ~=2.4.2
|
19
|
-
Requires-Dist: uvicorn
|
20
|
-
Requires-Dist: fastapi
|
21
|
-
Requires-Dist: ga4gh.vrs
|
22
|
-
Provides-Extra: dev
|
23
|
-
Requires-Dist: pre-commit ; extra == 'dev'
|
24
|
-
Requires-Dist: ipython ; extra == 'dev'
|
25
|
-
Requires-Dist: ipykernel ; extra == 'dev'
|
26
|
-
Requires-Dist: psycopg2-binary ; extra == 'dev'
|
27
|
-
Requires-Dist: ruff ; extra == 'dev'
|
28
|
-
Requires-Dist: black ; extra == 'dev'
|
29
|
-
Provides-Extra: tests
|
30
|
-
Requires-Dist: pytest ; extra == 'tests'
|
31
|
-
Requires-Dist: pytest-cov ; extra == 'tests'
|
32
|
-
Requires-Dist: pytest-asyncio ==0.18.3 ; extra == 'tests'
|
33
|
-
Requires-Dist: mock ; extra == 'tests'
|
34
|
-
|
35
|
-
# **C**ommon **O**perations **O**n **L**ots-of **Seq**uences Tool
|
36
|
-
|
37
|
-
The **cool-seq-tool** provides:
|
38
|
-
|
39
|
-
- Transcript alignment data from the [UTA](https://github.com/biocommons/uta) database
|
40
|
-
- Fast access to sequence data using [SeqRepo](https://github.com/biocommons/biocommons.seqrepo)
|
41
|
-
- Liftover between assemblies (GRCh38 <--> GRCh37) from [PyLiftover](https://github.com/konstantint/pyliftover)
|
42
|
-
- Lifting over to preferred [MANE](https://www.ncbi.nlm.nih.gov/refseq/MANE/) compatible transcript. See [here](docs/TranscriptSelectionPriority.md) for more information.
|
43
|
-
|
44
|
-
## Installation
|
45
|
-
|
46
|
-
### pip
|
47
|
-
|
48
|
-
```commandline
|
49
|
-
pip install cool-seq-tool[dev,tests]
|
50
|
-
```
|
51
|
-
|
52
|
-
### Development
|
53
|
-
|
54
|
-
Clone the repo:
|
55
|
-
|
56
|
-
```commandline
|
57
|
-
git clone https://github.com/GenomicMedLab/cool-seq-tool
|
58
|
-
cd cool_seq_tool
|
59
|
-
```
|
60
|
-
|
61
|
-
[Install Pipenv](https://pipenv-fork.readthedocs.io/en/latest/#install-pipenv-today) if necessary.
|
62
|
-
|
63
|
-
Install backend dependencies and enter Pipenv environment:
|
64
|
-
|
65
|
-
```commandline
|
66
|
-
pipenv shell
|
67
|
-
pipenv update
|
68
|
-
pipenv install --dev
|
69
|
-
```
|
70
|
-
|
71
|
-
### UTA Database Installation
|
72
|
-
|
73
|
-
`cool-seq-tool` uses intalls local UTA database. For other ways to install, visit [biocommons.uta](https://github.com/biocommons/uta).
|
74
|
-
|
75
|
-
#### Local Installation
|
76
|
-
|
77
|
-
_The following commands will likely need modification appropriate for the installation environment._
|
78
|
-
1. Install [PostgreSQL](https://www.postgresql.org/)
|
79
|
-
2. Create user and database.
|
80
|
-
|
81
|
-
```
|
82
|
-
$ createuser -U postgres uta_admin
|
83
|
-
$ createuser -U postgres anonymous
|
84
|
-
$ createdb -U postgres -O uta_admin uta
|
85
|
-
```
|
86
|
-
|
87
|
-
3. To install locally, from the _cool_seq_tool/data_ directory:
|
88
|
-
```
|
89
|
-
export UTA_VERSION=uta_20210129.pgd.gz
|
90
|
-
curl -O http://dl.biocommons.org/uta/$UTA_VERSION
|
91
|
-
gzip -cdq ${UTA_VERSION} | grep -v "^REFRESH MATERIALIZED VIEW" | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5433
|
92
|
-
```
|
93
|
-
|
94
|
-
##### UTA Installation Issues
|
95
|
-
If you have trouble installing UTA, you can visit [these two READMEs](https://github.com/ga4gh/vrs-python/tree/main/docs/setup_help).
|
96
|
-
|
97
|
-
#### Connecting to the database
|
98
|
-
|
99
|
-
To connect to the UTA database, you can use the default url (`postgresql://uta_admin:uta@localhost:5433/uta/uta_20210129`).
|
100
|
-
|
101
|
-
If you do not wish to use the default, you must set the environment variable `UTA_DB_URL` which has the format of `driver://user:password@host:port/database/schema`.
|
102
|
-
|
103
|
-
### Data Downloads
|
104
|
-
|
105
|
-
#### SeqRepo
|
106
|
-
`cool-seq-tool` relies on [seqrepo](https://github.com/biocommons/biocommons.seqrepo), which you must download yourself.
|
107
|
-
|
108
|
-
Use the `SEQREPO_ROOT_DIR` environment variable to set the path of an already existing SeqRepo directory. The default is `/usr/local/share/seqrepo/latest`.
|
109
|
-
|
110
|
-
From the _root_ directory:
|
111
|
-
```
|
112
|
-
pip install seqrepo
|
113
|
-
sudo mkdir /usr/local/share/seqrepo
|
114
|
-
sudo chown $USER /usr/local/share/seqrepo
|
115
|
-
seqrepo pull -i 2021-01-29 # Replace with latest version using `seqrepo list-remote-instances` if outdated
|
116
|
-
```
|
117
|
-
|
118
|
-
If you get an error similar to the one below:
|
119
|
-
```
|
120
|
-
PermissionError: [Error 13] Permission denied: '/usr/local/share/seqrepo/2021-01-29._fkuefgd' -> '/usr/local/share/seqrepo/2021-01-29'
|
121
|
-
```
|
122
|
-
|
123
|
-
You will want to do the following:\
|
124
|
-
(*Might not be ._fkuefgd, so replace with your error message path*)
|
125
|
-
```console
|
126
|
-
sudo mv /usr/local/share/seqrepo/2021-01-29._fkuefgd /usr/local/share/seqrepo/2021-01-29
|
127
|
-
exit
|
128
|
-
```
|
129
|
-
|
130
|
-
#### LRG_RefSeqGene
|
131
|
-
|
132
|
-
`cool-seq-tool` fetches the latest version of `LRG_RefSeqGene` if the environment variable `LRG_REFSEQGENE_PATH` is not set. When `LRG_REFSEQGENE_PATH` is set, `cool-seq-tool` will look at this path and expect the LRG_RefSeqGene file. This file is found can be found [here](https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/RefSeqGene).
|
133
|
-
|
134
|
-
#### MANE Summary Data
|
135
|
-
|
136
|
-
`cool-seq-tool` fetches the latest version of `MANE.GRCh38.*.summary.txt.gz` if the environment variable `MANE_SUMMARY_PATH` is not set. When `MANE_SUMMARY_PATH` is set, `cool-seq-tool` will look at this path and expect the MANE Summary Data file. This file is found can be found [here](https://ftp.ncbi.nlm.nih.gov/refseq/MANE/MANE_human/current/).
|
137
|
-
|
138
|
-
#### transcript_mapping.tsv
|
139
|
-
`cool-seq-tool` is packaged with transcript mapping data acquired from [Ensembl BioMart](http://www.ensembl.org/biomart/martview). If the environment variable `TRANSCRIPT_MAPPINGS_PATH` is not set, `cool-seq-tool` will use the built-in file. When `TRANSCRIPT_MAPPINGS_PATH` is set, `cool_seq_tool` will look at this path and expect to find the transcript mapping TSV file.
|
140
|
-
|
141
|
-
To acquire this data manually from the [BioMart](https://www.ensembl.org/biomart/martview), select the `Human Genes (GRCh38.p13)` dataset and choose the following attributes:
|
142
|
-
|
143
|
-
* Gene stable ID
|
144
|
-
* Gene stable ID version
|
145
|
-
* Transcript stable ID
|
146
|
-
* Transcript stable ID version
|
147
|
-
* Protein stable ID
|
148
|
-
* Protein stable ID version
|
149
|
-
* RefSeq match transcript (MANE Select)
|
150
|
-
* Gene name
|
151
|
-
|
152
|
-

|
153
|
-
|
154
|
-
## Starting the UTA Tools Service Locally
|
155
|
-
|
156
|
-
To start the service, run the following:
|
157
|
-
|
158
|
-
```commandline
|
159
|
-
uvicorn cool_seq_tool.api:app --reload
|
160
|
-
```
|
161
|
-
|
162
|
-
Next, view the FastAPI on your local machine: http://127.0.0.1:8000/cool_seq_tool
|
163
|
-
|
164
|
-
## Init coding style tests
|
165
|
-
|
166
|
-
Code style is managed by [Ruff](https://github.com/astral-sh/ruff) and [Black](https://github.com/psf/black), and should be checked prior to commit.
|
167
|
-
|
168
|
-
We use [pre-commit](https://pre-commit.com/#usage) to run conformance tests.
|
169
|
-
|
170
|
-
This ensures:
|
171
|
-
|
172
|
-
* Check code style
|
173
|
-
* Check for added large files
|
174
|
-
* Detect AWS Credentials
|
175
|
-
* Detect Private Key
|
176
|
-
|
177
|
-
Before first commit run:
|
178
|
-
|
179
|
-
```
|
180
|
-
pre-commit install
|
181
|
-
```
|
182
|
-
|
183
|
-
## Testing
|
184
|
-
From the _root_ directory of the repository:
|
185
|
-
```
|
186
|
-
pytest
|
187
|
-
```
|
@@ -1,29 +0,0 @@
|
|
1
|
-
cool_seq_tool/__init__.py,sha256=eBycAZIAJBCf51xQLQYHzvUep1i21LMrzBdRLqfe-Fc,352
|
2
|
-
cool_seq_tool/api.py,sha256=Zx_HO7aLCQI5g9P0IQkVSUOLt7kUOFGXoibYCU6oits,1248
|
3
|
-
cool_seq_tool/app.py,sha256=rkRq7pUiCOcz6hXXRxXmTCwj1z-fU4KiF5HK7Btf0DU,2434
|
4
|
-
cool_seq_tool/paths.py,sha256=7EA21Vmf9hvct0z_V4oK0WTWOA2FKY2Tavh4nAUXunk,889
|
5
|
-
cool_seq_tool/schemas.py,sha256=Xugda7yguRokwpqRRA7T899yC0ONCiiZqKPy58IpM_U,15299
|
6
|
-
cool_seq_tool/utils.py,sha256=U0Pqjs14B0XoAjFfhUfN7D4bHZXapG33xhk48B5lrtU,1471
|
7
|
-
cool_seq_tool/version.py,sha256=m7j1OC4DZqVebe1tYxFgQ8JHISRFnhz7ljsYFTmZOLU,57
|
8
|
-
cool_seq_tool/data/__init__.py,sha256=EAk0f_xeq1JAkRosLMiWWhEXku6lYfMZ63HR_6QxSqs,77
|
9
|
-
cool_seq_tool/data/data_downloads.py,sha256=mMURyb6E5KXYw3VQ-YuVkZGobXuGfsqyfuwQfZs9wrk,3473
|
10
|
-
cool_seq_tool/data/transcript_mapping.tsv,sha256=AO3luYQAbFiCoRgiiPXotakb5pAwx1jDCeXpvGdIuac,24138769
|
11
|
-
cool_seq_tool/handlers/__init__.py,sha256=xDQ84N4ImrUBKwGmrg64yGUMh0ArW-DwjJuTkKrIJL4,77
|
12
|
-
cool_seq_tool/handlers/seqrepo_access.py,sha256=z1dG2qiPgeR-1GnOi11b5cN9tFIXVPg6G4oBfCDXIko,7554
|
13
|
-
cool_seq_tool/mappers/__init__.py,sha256=F85y9PKtpqCTrdDgWMHYO5Z8QG0ZsFDD_7JK2vnELKU,184
|
14
|
-
cool_seq_tool/mappers/alignment.py,sha256=ZalYocD7t7O-PHKs6_upgAZ607GsszePduvru0Ruq18,9734
|
15
|
-
cool_seq_tool/mappers/exon_genomic_coords.py,sha256=PnVZwCu3ybTGwlW6zzPQ-ufrxRyPhn7_GCS8e_jVZ3w,22624
|
16
|
-
cool_seq_tool/mappers/mane_transcript.py,sha256=41wmJSgz53fugWLTSsB8YtNYIAZR7Yrr5Wq4VOAkRwc,42699
|
17
|
-
cool_seq_tool/routers/__init__.py,sha256=x00Dq0LzqYoBPbipmyexLbAMUb7Udx8DozfwT-mJo1E,436
|
18
|
-
cool_seq_tool/routers/default.py,sha256=F9NDKv7oiEz7Yk9BKAqhJxTBELpYLDhOqB8keRVtYkU,3937
|
19
|
-
cool_seq_tool/routers/mane.py,sha256=dcB1GLIxi9H2c6rFHoF39heCIfQyRvSZf1riEyaDbTo,3547
|
20
|
-
cool_seq_tool/routers/mappings.py,sha256=XBv2OyJ8uwkt2rpCWhBf1ZuxJRAHUftFcy8vGaHf0PM,6105
|
21
|
-
cool_seq_tool/sources/__init__.py,sha256=s8Zx-W_eUhxiHGYbEwuKt_V_xkDc8UjUIcRGJtyWhdM,228
|
22
|
-
cool_seq_tool/sources/mane_transcript_mappings.py,sha256=LrTMI17cBzaeXLRzFoaKXcz_QQZdmtTWEjVZEBXR1Jo,2846
|
23
|
-
cool_seq_tool/sources/transcript_mappings.py,sha256=CQWqo_36gnH1n75qG_n4a3CABCHngGXO3ma6MUIuJUc,9011
|
24
|
-
cool_seq_tool/sources/uta_database.py,sha256=MTPOYUmzK8KZPsOdCuYu1qsW0vmNs2VrZx9nIocavo4,45585
|
25
|
-
cool_seq_tool-0.3.0.dev0.dist-info/LICENSE,sha256=q5ROz8j71Y0buub1pr-xaORSENWXKGyA76fRjKjKxHU,1061
|
26
|
-
cool_seq_tool-0.3.0.dev0.dist-info/METADATA,sha256=GPKvqjYUtNlhoo9EUwrLJ3EM_jZWa3c6woOoz-KurLI,6622
|
27
|
-
cool_seq_tool-0.3.0.dev0.dist-info/WHEEL,sha256=yQN5g4mg4AybRjkgi-9yy4iQEFibGQmlz78Pik5Or-A,92
|
28
|
-
cool_seq_tool-0.3.0.dev0.dist-info/top_level.txt,sha256=cGuxdN6p3y16jQf6hCwWhE4OptwUeZPm_PNJlPb3b0k,14
|
29
|
-
cool_seq_tool-0.3.0.dev0.dist-info/RECORD,,
|
File without changes
|