metient 0.1.1.dev2__tar.gz → 0.1.1.dev3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/PKG-INFO +1 -1
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/README.md +5 -3
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/lib/vertex_labeling.py +1 -1
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/data_extraction_util.py +82 -104
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient.egg-info/PKG-INFO +1 -1
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/setup.py +1 -1
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/__init__.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/lib/__init__.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/metient.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/__init__.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/create_conf_intervals_from_reads.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/deprecated.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/eval_util.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/extra.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/globals.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/pairtree_data_extraction_util.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/plotting_util.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient/util/vertex_labeling_util.py +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient.egg-info/SOURCES.txt +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient.egg-info/dependency_links.txt +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient.egg-info/requires.txt +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/metient.egg-info/top_level.txt +0 -0
- {metient-0.1.1.dev2 → metient-0.1.1.dev3}/setup.cfg +0 -0
|
@@ -23,14 +23,16 @@ Each row in the tsv should correspond to the reference and variant read counts a
|
|
|
23
23
|
The required fields for the tsv file:
|
|
24
24
|
| Column name | Description |
|
|
25
25
|
|----------|----------|
|
|
26
|
-
| **anatomical_site_index** | Zero-based index for anatomical_site_label column
|
|
26
|
+
| **anatomical_site_index** | Zero-based index for anatomical_site_label column. Rows with the same anatomical site index and cluster_index will get pooled together.|
|
|
27
27
|
| **anatomical_site_label** | Name of the anatomical site |
|
|
28
|
+
| **character_label** | Zero-based index for character_label column |
|
|
28
29
|
| **character_label** | Name of the mutation or cluster of mutations. This is used in visualizations, so it should be short. NOTE: due to graphing dependencies, this string cannot contain colons. |
|
|
29
|
-
| **cluster_index** | If using a clustering method, the cluster index that this mutation belongs to. NOTE: this must correspond to the indices used in the tree txt file.
|
|
30
|
+
| **cluster_index** | If using a clustering method, the cluster index that this mutation belongs to. NOTE: this must correspond to the indices used in the tree txt file. Rows with the same anatomical site index and cluster_index will get pooled together.|
|
|
30
31
|
| **ref** | The number of reads that map to the reference allele for this mutation or mutation cluster in this anatomical site. |
|
|
31
32
|
| **var** | The number of reads that map to the variant allele for this mutation or mutation cluster in this anatomical site. |
|
|
32
|
-
| **var_read_prob** | Let j = character_index. This is the probabilty of observing a read from the variant allele for mutation at j in a cell bearing the mutation. Thus, if mutation at j occurred at a diploid locus, this should be 0.5. In a haploid cell (e.g., male sex chromosome), this should be 1.0. If a copy-number aberration (CNA) duplicated the reference allele in the lineage bearing mutation j prior to j occurring, there will be two reference alleles and a single variant allele in all cells bearing j, such that var_read_prob = 0.3333. This gives Metient the ability to correct for the effect CNAs have on the relationship between VAF (i.e., the proportion of alleles bearing the mutation) and subclonal frequency (i.e., the proportion of cells bearing the mutation). This is modeled around PairTree's VAF correction, see S2.2 of [PairTree's supplementary info](https://aacr.silverchair-cdn.com/aacr/content_public/journal/bloodcancerdiscov/3/3/10.1158_2643-3230.bcd-21-0092/9/bcd-21-0092_supplementary_information_suppsm_sf1-sf21.pdf?Expires=1709221974&Signature=dJH6~Dg-6gEb-S88i0wDGW28QZn16keQj34Vo2tAvJL2cUJrQo48afpHPp-a2zAwQa~ET6SDgw3hb3ITacB06GDUc3GYCdCgYtfPMjFGwygFj-Q9xf-c44VAvwiyliwsBXK1shZmURlFMwSjzkwRwasuWu50sMNmeJSoVyX3nQ-rRBlK93aDR5s9c0l-p4aGvTi6QmfKJPsxXaHB4Lz5yXSl3Xd~JPK-Y~ltC14epDRb~MiSPWUFCAiYetUXcQ7J7vd6b4XQKT9PnYkjQtUq55tLSoUkOGe5JkJ32NXCeoT~l-XD97pCeDYVDOYzAuOkAG0tDYrPebEh2TGTA3fnbA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA) for more details. |
|
|
33
33
|
| **site_category** | Must be one of `primary` or `metastasis`. If multiple primaries are specified (i.e., the true primary is not known), we will run Metient multiple times with each primary used as the true primary. Output files are saved with the suffix `_{anatomical_site_label}` to indicate which primary was used in that run. |
|
|
34
|
+
| **var_read_prob** | This gives Metient the ability to correct for the effect copy number alterations CNAs) have on the relationship between VAF (i.e., the proportion of alleles bearing the mutation) and subclonal frequency (i.e., the proportion of cells bearing the mutation). Let j = character_index. This is the probabilty of observing a read from the variant allele for mutation at j in a cell bearing the mutation. Thus, if mutation at j occurred at a diploid locus, this should be 0.5. In a haploid cell (e.g., male sex chromosome), this should be 1.0. If a CNA duplicated the reference allele in the lineage bearing mutation j prior to j occurring, there will be two reference alleles and a single variant allele in all cells bearing j, such that var_read_prob = 0.3333. If using a CN caller that reports major and minor CN: `var_read_prob = (p*maj)/(p*(maj+min)+(1-p)*2)`, where `p` is tumor purity, `maj` is major CN, `min` is minor CN, and we're assuming the variant allele has major CN. For more information, see S2.2 of [PairTree's supplementary info](https://aacr.silverchair-cdn.com/aacr/content_public/journal/bloodcancerdiscov/3/3/10.1158_2643-3230.bcd-21-0092/9/bcd-21-0092_supplementary_information_suppsm_sf1-sf21.pdf?Expires=1709221974&Signature=dJH6~Dg-6gEb-S88i0wDGW28QZn16keQj34Vo2tAvJL2cUJrQo48afpHPp-a2zAwQa~ET6SDgw3hb3ITacB06GDUc3GYCdCgYtfPMjFGwygFj-Q9xf-c44VAvwiyliwsBXK1shZmURlFMwSjzkwRwasuWu50sMNmeJSoVyX3nQ-rRBlK93aDR5s9c0l-p4aGvTi6QmfKJPsxXaHB4Lz5yXSl3Xd~JPK-Y~ltC14epDRb~MiSPWUFCAiYetUXcQ7J7vd6b4XQKT9PnYkjQtUq55tLSoUkOGe5JkJ32NXCeoT~l-XD97pCeDYVDOYzAuOkAG0tDYrPebEh2TGTA3fnbA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA) for more details. |
|
|
35
|
+
|
|
34
36
|
|
|
35
37
|
anatomical_site_index anatomical_site_label cluster_index character_label ref var var_read_prob site_category num_mutations
|
|
36
38
|
0 breast 0 HER2 982 78 0.43 primary 54
|
|
@@ -919,7 +919,7 @@ def calibrate(tree_fns, tsv_fns, print_config, output_dir, run_names,
|
|
|
919
919
|
Os, batch_size, custom_colors, bias_weights, solve_polytomies):
|
|
920
920
|
|
|
921
921
|
if not (len(tree_fns) == len(tsv_fns) == len(run_names)):
|
|
922
|
-
raise ValueError("Inputs Ts, tsv_fns,
|
|
922
|
+
raise ValueError("Inputs Ts, tsv_fns, and run_names must have equal length (length = number of patients in cohort")
|
|
923
923
|
|
|
924
924
|
if isinstance(tree_fns[0], str):
|
|
925
925
|
Ts = []
|
|
@@ -2,7 +2,6 @@ import csv
|
|
|
2
2
|
import numpy as np
|
|
3
3
|
import os
|
|
4
4
|
import torch
|
|
5
|
-
import copy
|
|
6
5
|
from collections import OrderedDict
|
|
7
6
|
import pandas as pd
|
|
8
7
|
import sys
|
|
@@ -14,7 +13,6 @@ print("CUDA GPU:",torch.cuda.is_available())
|
|
|
14
13
|
if torch.cuda.is_available():
|
|
15
14
|
torch.set_default_tensor_type(torch.cuda.FloatTensor)
|
|
16
15
|
|
|
17
|
-
|
|
18
16
|
def get_adjacency_matrix_from_txt_edge_list(txt_file):
|
|
19
17
|
edges = []
|
|
20
18
|
max_idx = -1
|
|
@@ -44,28 +42,14 @@ def get_mut_to_cluster_map_from_pyclone_output(pyclone_cluster_fn, min_mut_thres
|
|
|
44
42
|
|
|
45
43
|
def pool_input_tsv(tsv_fn, output_dir, run_name):
|
|
46
44
|
'''
|
|
45
|
+
Pool reads from the same anatomical site index and cluster index
|
|
47
46
|
'''
|
|
48
|
-
df = pd.read_csv(tsv_fn, delimiter="\t")
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
for _,row in df.iterrows():
|
|
52
|
-
cid = int(row['cluster_index'])
|
|
53
|
-
mut_name_to_cluster_id[row['character_label']] = cid
|
|
54
|
-
if cid not in cluster_id_to_mut_names:
|
|
55
|
-
cluster_id_to_mut_names[cid] = []
|
|
56
|
-
cluster_id_to_mut_names[cid].append(row['character_label'])
|
|
57
|
-
|
|
58
|
-
cluster_id_to_cluster_name = {k:";".join(list(v)) for k,v in cluster_id_to_mut_names.items()}
|
|
59
|
-
cols = df.columns
|
|
60
|
-
|
|
61
|
-
df = df.drop(columns=["cluster_index"])
|
|
62
|
-
_, output_fn = write_pooled_tsv_from_clusters(df, mut_name_to_cluster_id, cluster_id_to_cluster_name, {},
|
|
63
|
-
output_dir, run_name, ";", None)
|
|
47
|
+
df = pd.read_csv(tsv_fn, delimiter="\t")
|
|
48
|
+
assert(set(df['site_category'])==set(['primary', 'metastasis']))
|
|
49
|
+
_, output_fn = write_pooled_tsv_from_clusters(df, {}, output_dir, run_name, None)
|
|
64
50
|
return output_fn
|
|
65
51
|
|
|
66
|
-
def write_pooled_tsv_from_clusters(df,
|
|
67
|
-
aggregation_rules, output_dir, patient_id, cluster_sep,
|
|
68
|
-
mutation_sep):
|
|
52
|
+
def write_pooled_tsv_from_clusters(df, aggregation_rules, output_dir, patient_id, mutation_sep):
|
|
69
53
|
'''
|
|
70
54
|
After clustering with any clustering algorithm, prepares tsvs by pooling mutations belonging
|
|
71
55
|
to the same cluster
|
|
@@ -74,7 +58,7 @@ def write_pooled_tsv_from_clusters(df, mut_name_to_cluster_id, cluster_id_to_clu
|
|
|
74
58
|
df: pandas DataFrame with reqiured columns: [character_label,
|
|
75
59
|
anatomical_site_index, anatomical_site_label, ref, var, var_read_prob, site_category]
|
|
76
60
|
|
|
77
|
-
|
|
61
|
+
mut_idx_to_cluster_id: dictionary mapping mutation index ('character_index' in input df)
|
|
78
62
|
to a cluster id
|
|
79
63
|
|
|
80
64
|
cluster_id_to_cluster_name: dictionary mapping a cluster id to the name in the output tsv
|
|
@@ -94,8 +78,8 @@ def write_pooled_tsv_from_clusters(df, mut_name_to_cluster_id, cluster_id_to_clu
|
|
|
94
78
|
|
|
95
79
|
'''
|
|
96
80
|
|
|
97
|
-
required_cols = ["anatomical_site_index", "anatomical_site_label",
|
|
98
|
-
|
|
81
|
+
required_cols = ["anatomical_site_index", "anatomical_site_label", "cluster_index",
|
|
82
|
+
"character_index", "character_label", "ref","var",
|
|
99
83
|
"var_read_prob", "site_category"]
|
|
100
84
|
|
|
101
85
|
all_cols = required_cols + list(aggregation_rules.keys())
|
|
@@ -116,33 +100,34 @@ def write_pooled_tsv_from_clusters(df, mut_name_to_cluster_id, cluster_id_to_clu
|
|
|
116
100
|
df['ref'] = df.apply(lambda row: row['total_reads_corrected']-row['var'], axis=1)
|
|
117
101
|
df['var_read_prob'] = 0.5
|
|
118
102
|
|
|
119
|
-
df['cluster'] = df.apply(lambda row:
|
|
120
|
-
df = df.dropna(subset=['
|
|
103
|
+
#df['cluster'] = df.apply(lambda row: mut_idx_to_cluster_id[row['character_index']] if row['character_index'] in mut_idx_to_cluster_id else np.nan, axis=1)
|
|
104
|
+
df = df.dropna(subset=['cluster_index'])
|
|
105
|
+
|
|
106
|
+
# Save the number of mutations in each cluster before pooling
|
|
107
|
+
cluster_id_to_num_muts = df.groupby('cluster_index')['character_index'].nunique().to_dict()
|
|
108
|
+
cluster_id_to_mut_names = df.groupby('cluster_index')['character_label'].unique().apply(list).to_dict()
|
|
121
109
|
|
|
122
110
|
# 2. Pool reference and variant allele counts from all mutations within a cluster
|
|
123
|
-
pooled_df = df.drop(['character_label',
|
|
111
|
+
pooled_df = df.drop(['character_label','character_index'], axis=1) # we're going to add this back in later
|
|
124
112
|
|
|
125
113
|
# TODO: validate that site_category is the same for each anatomical site
|
|
126
|
-
ref_var_rules = {'ref': np.sum, 'var': np.sum,'total_reads_corrected': np.sum, "var_read_prob": 'first',
|
|
114
|
+
ref_var_rules = {'ref': np.sum, 'var': np.sum,'total_reads_corrected': np.sum, "var_read_prob": 'first',
|
|
115
|
+
'site_category':'first', 'anatomical_site_label':lambda x: ';'.join(set(x)),}
|
|
116
|
+
|
|
117
|
+
pooled_df = pooled_df.groupby(['cluster_index', 'anatomical_site_index'], as_index=False).agg({**ref_var_rules, **aggregation_rules})
|
|
127
118
|
|
|
128
|
-
pooled_df = pooled_df.groupby(['cluster', 'anatomical_site_label'], as_index=False).agg({**ref_var_rules, **aggregation_rules})
|
|
129
|
-
|
|
130
|
-
pooled_df['character_label'] = pooled_df.apply(lambda row: cluster_id_to_cluster_name[row['cluster']], axis=1)
|
|
131
|
-
|
|
132
119
|
# 3. Add indices for mutations, samples and anatomical sites as needed for input format
|
|
133
|
-
pooled_df['character_index'] = pooled_df['
|
|
120
|
+
pooled_df['character_index'] = pooled_df['cluster_index'].tolist()
|
|
134
121
|
pooled_df['anatomical_site_index'] = pooled_df.apply(lambda row: list(pooled_df['anatomical_site_label'].unique()).index(row["anatomical_site_label"]), axis=1)
|
|
135
|
-
all_cols.append("total_reads_corrected")
|
|
136
|
-
all_cols.append("character_index")
|
|
137
|
-
pooled_df = pooled_df[all_cols]
|
|
138
122
|
|
|
139
123
|
# 4. Do some post-processing, e.g. adding number of mutations and shortening character label for display
|
|
140
|
-
pooled_df['num_mutations'] = pooled_df.apply(lambda row:
|
|
141
|
-
pooled_df['
|
|
142
|
-
pooled_df['character_label'] = pooled_df.apply(lambda row:get_pruned_mut_label(row['character_label'], cluster_sep, mutation_sep), axis=1)
|
|
124
|
+
pooled_df['num_mutations'] = pooled_df.apply(lambda row:cluster_id_to_num_muts[int(row['character_index'])], axis=1)
|
|
125
|
+
pooled_df['character_label'] = pooled_df.apply(lambda row:get_pruned_mut_label(cluster_id_to_mut_names[row['cluster_index']], mutation_sep), axis=1)
|
|
143
126
|
pooled_df['total_reads_corrected'] = pooled_df['total_reads_corrected'].round(0).astype(int)
|
|
144
127
|
pooled_df['var'] = pooled_df['var'].round(0).astype(int)
|
|
145
128
|
pooled_df['ref'] = pooled_df.apply(lambda row: row['total_reads_corrected']-row['var'], axis=1)
|
|
129
|
+
all_cols.append('num_mutations')
|
|
130
|
+
pooled_df = pooled_df[all_cols]
|
|
146
131
|
|
|
147
132
|
# Save
|
|
148
133
|
output_fn = os.path.join(output_dir, f"{patient_id}_clustered_SNVs.tsv")
|
|
@@ -162,71 +147,70 @@ def calc_var_read_prob(major_cn, minor_cn, purity):
|
|
|
162
147
|
var_read_prob = (p*major_cn)/x
|
|
163
148
|
return var_read_prob
|
|
164
149
|
|
|
165
|
-
def write_pooled_tsv_from_conipher_pyclone_clusters(input_data_tsv_fn, clusters_tsv_fn,
|
|
166
|
-
|
|
167
|
-
|
|
150
|
+
# def write_pooled_tsv_from_conipher_pyclone_clusters(input_data_tsv_fn, clusters_tsv_fn,
|
|
151
|
+
# aggregation_rules, output_dir, patient_id,
|
|
152
|
+
# cluster_sep, mutation_sep):
|
|
168
153
|
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
154
|
+
# '''
|
|
155
|
+
# After clustering with PyClone using CONIPHER (https://github.com/McGranahanLab/CONIPHER-wrapper),
|
|
156
|
+
# prepares tsvs by pooling mutations belonging to the same cluster
|
|
172
157
|
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
158
|
+
# Args:
|
|
159
|
+
# tsv_fn: path to tsv with reqiured columns: [#sample_index, sample_label,
|
|
160
|
+
# character_index, character_label, anatomical_site_index, anatomical_site_label,
|
|
161
|
+
# ref, var]
|
|
177
162
|
|
|
178
|
-
|
|
163
|
+
# clusters_tsv_fn: PyClone results tsv that maps each mutation to a cluster id
|
|
179
164
|
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
165
|
+
# aggregation_rules: dictionary indicating how to aggregate any extra columns that are not in
|
|
166
|
+
# the required columns (specificied above). e.g. "first" aggregation rules can be used for columns
|
|
167
|
+
# shared by rows with the same anatomical_site_label, since we are aggregating within an
|
|
168
|
+
# anatomical_site_label.
|
|
184
169
|
|
|
185
|
-
|
|
170
|
+
# output_dir: where to save clustered tsv
|
|
186
171
|
|
|
187
|
-
|
|
172
|
+
# patient_id: name of patient used in tsv filename
|
|
188
173
|
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
174
|
+
# cluster_sep: string that separates names of mutations when creating a cluster name.
|
|
175
|
+
# e.g. cluster name for mutations ABC:4:3 and DEF:1:2 with cluster_sep=";" will be
|
|
176
|
+
# "ABC:4:3;DEF:1:2"
|
|
192
177
|
|
|
193
|
-
|
|
194
|
-
|
|
178
|
+
# mutation_sep: string that separates information about a variant, e.g.":" for a mutation
|
|
179
|
+
# named ABC:4:3. Can be None if variants are not formatted this way.
|
|
195
180
|
|
|
196
181
|
|
|
197
|
-
|
|
182
|
+
# Outputs:
|
|
198
183
|
|
|
199
|
-
|
|
184
|
+
# Saves pooled tsv at {output_dir}/{patient_id}_clustered_SNVs.tsv
|
|
200
185
|
|
|
201
|
-
|
|
186
|
+
# '''
|
|
202
187
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
188
|
+
# df = pd.read_csv(input_data_tsv_fn, delimiter="\t", index_col=0)
|
|
189
|
+
# pyclone_df = pd.read_csv(clusters_tsv_fn, delimiter="\t")
|
|
190
|
+
# mut_idx_to_cluster_id = dict()
|
|
191
|
+
# cluster_id_to_mut_names = dict()
|
|
192
|
+
# # 1. Get mapping between mutation names and PyClone cluster ids
|
|
193
|
+
# for _, row in df.iterrows():
|
|
194
|
+
# mut_items = row['character_label'].split(":")
|
|
195
|
+
# cluster_id = pyclone_df[(pyclone_df['CHR']==int(mut_items[1]))&(pyclone_df['POS']==int(mut_items[2]))&(pyclone_df['REF']==mut_items[3])]['treeCLUSTER'].unique()
|
|
196
|
+
# assert(len(cluster_id) <= 1)
|
|
197
|
+
# if len(cluster_id) == 1:
|
|
198
|
+
# cluster_id = int(cluster_id.item())
|
|
199
|
+
# mut_idx_to_cluster_id[int(row['character_index'])] = cluster_id
|
|
200
|
+
# if cluster_id not in cluster_id_to_mut_names:
|
|
201
|
+
# cluster_id_to_mut_names[cluster_id] = set()
|
|
202
|
+
# else:
|
|
203
|
+
# cluster_id_to_mut_names[cluster_id].add(row['character_label'])
|
|
219
204
|
|
|
220
|
-
|
|
221
|
-
|
|
205
|
+
# # 2. Set new names for clustered mutations
|
|
206
|
+
# cluster_id_to_cluster_name = {k:cluster_sep.join(list(v)) for k,v in cluster_id_to_mut_names.items()}
|
|
222
207
|
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
df['site_category'] = df.apply(lambda row: 'primary' if 'primary' in row['anatomical_site_label'] else 'metastasis', axis=1)
|
|
208
|
+
# df['var_read_prob'] = df.apply(lambda row: calc_var_read_prob(row['major_cn'], row['minor_cn'], row['purity']), axis=1)
|
|
209
|
+
# df['site_category'] = df.apply(lambda row: 'primary' if 'primary' in row['anatomical_site_label'] else 'metastasis', axis=1)
|
|
226
210
|
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
211
|
+
# # 3. Pool mutations and write to file
|
|
212
|
+
# return write_pooled_tsv_from_clusters(df, mut_idx_to_cluster_id, cluster_id_to_cluster_name,
|
|
213
|
+
# aggregation_rules, output_dir, patient_id, cluster_sep, mutation_sep)
|
|
230
214
|
|
|
231
215
|
def is_resolved_polytomy_cluster(cluster_label):
|
|
232
216
|
'''
|
|
@@ -399,23 +383,26 @@ def get_ref_var_omega_matrices(tsv_filepaths):
|
|
|
399
383
|
|
|
400
384
|
return ref_matrices, var_matrices, omega_matrices, ordered_sites, idx_to_label_dicts, idx_to_num_muts
|
|
401
385
|
|
|
402
|
-
def get_pruned_mut_label(
|
|
386
|
+
def get_pruned_mut_label(mut_names, mutation_sep, k=2):
|
|
387
|
+
|
|
388
|
+
# If mutation names contain colons (e.g. LOC1:9:12312321), take everything before the first
|
|
389
|
+
# colon for displaying since otherwise we can't display the label w/ our graphing library dependencies
|
|
403
390
|
if mutation_sep != None:
|
|
404
|
-
gene_names = [item.split(mutation_sep)[0] for item in
|
|
391
|
+
gene_names = [item.split(mutation_sep)[0] for item in mut_names]
|
|
405
392
|
else:
|
|
406
|
-
gene_names = [item for item in
|
|
393
|
+
gene_names = [item for item in mut_names]
|
|
407
394
|
|
|
408
|
-
gene_candidates =
|
|
395
|
+
gene_candidates = set()
|
|
409
396
|
for gene in gene_names:
|
|
410
397
|
gene = gene.upper()
|
|
411
398
|
if gene in CANCER_DRIVER_GENES:
|
|
412
|
-
gene_candidates.
|
|
399
|
+
gene_candidates.add(gene)
|
|
413
400
|
elif gene in ENSEMBLE_TO_GENE_MAP:
|
|
414
|
-
gene_candidates.
|
|
401
|
+
gene_candidates.add(ENSEMBLE_TO_GENE_MAP[gene])
|
|
415
402
|
final_genes = gene_names if len(gene_candidates) == 0 else gene_candidates
|
|
416
403
|
|
|
417
404
|
k = k if len(final_genes) > k else len(final_genes)
|
|
418
|
-
return "_".join(final_genes[:k])
|
|
405
|
+
return "_".join(list(final_genes)[:k])
|
|
419
406
|
|
|
420
407
|
def get_primary_sites(tsv_filepath):
|
|
421
408
|
# For tsvs with very large fields
|
|
@@ -579,15 +566,6 @@ def _get_adj_matrix_from_machina_tree(tree_edges, idx_to_character_label, remove
|
|
|
579
566
|
ret_dict = {v:k for k,v in ret_dict.items()}
|
|
580
567
|
return T, ret_dict
|
|
581
568
|
|
|
582
|
-
def get_index_to_cluster_label_from_corrected_sim_tsv(ref_var_fn):
|
|
583
|
-
df = pd.read_csv(ref_var_fn, sep="\t")
|
|
584
|
-
idx_to_label = {}
|
|
585
|
-
labels = df['character_label'].unique()
|
|
586
|
-
for label in labels:
|
|
587
|
-
idx = df[df['character_label']==label]['character_index'].unique().item()
|
|
588
|
-
idx_to_label[idx] = label
|
|
589
|
-
return idx_to_label
|
|
590
|
-
|
|
591
569
|
def get_adj_matrices_from_spruce_mutation_trees(mut_trees_filename, idx_to_character_label, is_sim_data=False):
|
|
592
570
|
'''
|
|
593
571
|
When running MACHINA's generatemutationtrees executable (SPRUCE), it provides a txt file with
|
|
@@ -4,4 +4,4 @@ with open('requirements.txt') as f:
|
|
|
4
4
|
requirements = f.read().splitlines()
|
|
5
5
|
#print(requirements)
|
|
6
6
|
|
|
7
|
-
setup(name='metient', version='0.1.1.
|
|
7
|
+
setup(name='metient', version='0.1.1.dev3', url="https://github.com/divyakoyy/metient.git", packages=['metient', 'metient.util', 'metient.lib'], install_requires=requirements,)
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|